So You Wanna Be An XML Developer
Tim Bray
Principal, Textuality
For: WWW7
Knowledge is a  text-based application.

Road Map
  • General Info
  • Sample Data
  • Specs
  • Processors
  • APIs
  • Editing
  • Delivering
Save This URL

General Info
  • XML's home base at the W3C
  • Robin Cover's wonderful web page
  • from O'Reilly and Seybold
  • Peter Flynn's XML FAQ
Sample Data
  • Jon Bosak's Shakespeare and Religion texts
  • James Clark's well-formedness test suite
  • Office des Publication Oficielles des Communautés Européennes (EU) documents in 11 languages
The Specs
  • The official version at W3C; Available in HTML, RTF, PostScript, PDF, and XML
  • Annotated version
Why Use An XML Processor?

A conforming XML processor takes care of:

  • Picking apart tags:
  • <a href='&home;/&art;/madonna.html'
    ><img src 
     = "madonna.jpg"    alt='Mus&#xe9;e du Louvre'
  • Turning native encodings into Unicode
  • Normalizing line-ends
  • Doing internal entities and default attribute values
XML Processors in Java (1)
  • Æelfred, from Microstar; small (25k), non-validating, non-conformant
  • DXP, from DataChannel; large, validating
  • Lark/Larval, from T. Bray; smallish (45K) validating, highly conformant, good error messages
XML Processors in Java (2)
  • MSXML, from Microsoft; validating, smallish (< 100K), a little behind the spec right now
  • XML for Java, from IBM Japan; large, incrementally validating
  • XP, from J. Clark; large, fast, non-validating, highly conformant
XML Processors in C
  • Expat, from James Clark; Unbelievably fast, highly conformant, non-validating, integrated with Mozilla and perl.
  • LTXML, from Henry Thompson & LTG Group @ U.Edinburgh; non-validating, optimized for pipeline/stream processing
  • MSXML, from Microsoft; non-validating, inside IE4
... in Other Languages
  • TCL from ANU
  • Python from Lars Marius Garshol
  • XParse, < 5K of JavaScript
APIs for XML
  • SAX (Simple API for XML) from D. Megginson and the xml-dev gang. Event-stream based, mostly for Java.
  • DOM (Document Object Model), a W3C Activity. Language-independent, platform-independent, covers HTML & CSS too. Bindings in Java, ECMAScript, and IDL. Microsoft and Netscape both on board.
  • ... or, build your own.
Avoid the Parser, Use Perl
  • XML::Parser module for perl 5, work in progress
  • perl is getting Unicode support in parallel with this activity
How to Author XML for Free
  • PSGML-mode for GNU Emacs, adapted for XML
  • T. Bray's XML-mode for GNU Emacs (email me)
  • Henry Thompson's XED mini-editor
How to Author XML & Pay For It
  • Adobe FrameMaker, XML export
  • ArborText, existing SGML vendor
  • SoftQuad, existing SGML vendor
Delivering XML
  • Inso DynaText, DynaBase, DynaWeb. Native SGML/XML viewing and HTML query/generation.
  • in Microsoft IE4
  • in Netscape Mozilla
  • SoftQuad's Panorama; native SGML/XML viewing.