So You Wanna Be An XML Developer
Tim Bray
Principal, Textuality
tbray@textuality.com http://www.textuality.com
For: WWW7
Knowledge is a  text-based application.

Road Map
  • General Info
  • Sample Data
  • Specs
  • Processors
  • APIs
  • Editing
  • Delivering
Save This URL

http://www.textuality.com/WWW7

General Info
  • http://www.w3.org/XML/ XML's home base at the W3C
  • http://www.sil.org/sgml/xml.html Robin Cover's wonderful web page
  • http://www.xml.com from O'Reilly and Seybold
  • http://www.ucc.ie/xml Peter Flynn's XML FAQ
Sample Data
  • http://sunsite.unc.edu/pub/sun-info/xml/eg/ Jon Bosak's Shakespeare and Religion texts
  • http://www.jclark.com/xml James Clark's well-formedness test suite
  • http://europa.eu.int/xml-testfiles Office des Publication Oficielles des Communautés Européennes (EU) documents in 11 languages
The Specs
  • http://www.w3.org/TR/REC-xml The official version at W3C; Available in HTML, RTF, PostScript, PDF, and XML
  • http://www.xml.com/axml/axml.html Annotated version
Why Use An XML Processor?

A conforming XML processor takes care of:

  • Picking apart tags:
  • <a href='&home;/&art;/madonna.html'
    ><img src 
     = "madonna.jpg"    alt='Mus&#xe9;e du Louvre'
    /></a>
  • Turning native encodings into Unicode
  • Normalizing line-ends
  • Doing internal entities and default attribute values
XML Processors in Java (1)
  • http://www.microstar.com/XML/ Æelfred, from Microstar; small (25k), non-validating, non-conformant
  • http://www.datachannel.com/products/xml/DXP/ DXP, from DataChannel; large, validating
  • http://www.textuality.com/Lark Lark/Larval, from T. Bray; smallish (45K) validating, highly conformant, good error messages
XML Processors in Java (2)
  • http://www.microsoft.com/workshop/author/xml/parser/ MSXML, from Microsoft; validating, smallish (< 100K), a little behind the spec right now
  • http://www.alphaworks.ibm.com/formula/xml XML for Java, from IBM Japan; large, incrementally validating
  • http://www.jclark.com/xml/xp/index.html XP, from J. Clark; large, fast, non-validating, highly conformant
XML Processors in C
  • http://www.jclark.com/xml/expat.html Expat, from James Clark; Unbelievably fast, highly conformant, non-validating, integrated with Mozilla and perl.
  • http://www.ltg.ed.ac.uk/software/xml/ LTXML, from Henry Thompson & LTG Group @ U.Edinburgh; non-validating, optimized for pipeline/stream processing
  • http://www.microsoft.com/xml/cparser.htm MSXML, from Microsoft; non-validating, inside IE4
... in Other Languages
  • http://tcltk.anu.edu.au/XML/ TCL from ANU
  • http://www.stud.ifi.uio.no/~larsga/download/python/xml/xmlproc.html Python from Lars Marius Garshol
  • http://www.jeremie.com/Dev/XML/ XParse, < 5K of JavaScript
APIs for XML
  • http://www.microstar.com/XML/SAX/ SAX (Simple API for XML) from D. Megginson and the xml-dev gang. Event-stream based, mostly for Java.
  • http://www.w3.org/TR/WD-DOM/ DOM (Document Object Model), a W3C Activity. Language-independent, platform-independent, covers HTML & CSS too. Bindings in Java, ECMAScript, and IDL. Microsoft and Netscape both on board.
  • ... or, build your own.
Avoid the Parser, Use Perl
  • ftp://www.wall.org/pub/larry/xmlparser-0.0.tar.gz XML::Parser module for perl 5, work in progress
  • perl is getting Unicode support in parallel with this activity
How to Author XML for Free
  • http://home.sprynet.com/sprynet/dmeggins/psgmlxml-19971208.zip PSGML-mode for GNU Emacs, adapted for XML
  • T. Bray's XML-mode for GNU Emacs (email me)
  • ftp://ftp.cogsci.ed.ac.uk/pub/ht/xed.zip Henry Thompson's XED mini-editor
How to Author XML & Pay For It
  • http://www.adobe.com/prodindex/framemaker/ Adobe FrameMaker, XML export
  • http://www.arbortext.com ArborText, existing SGML vendor
  • http://www.sq.com SoftQuad, existing SGML vendor
Delivering XML
  • http://www.inso.com/ Inso DynaText, DynaBase, DynaWeb. Native SGML/XML viewing and HTML query/generation.
  • http://www.microsoft.com/xml in Microsoft IE4
  • http://www.mozilla.org/rdf/doc/xml.html in Netscape Mozilla
  • http://www.sq.com/products/pc-pview.htm SoftQuad's Panorama; native SGML/XML viewing.