Draft DD-1996-0001 - Design Principles for XML

Date: August 25, 1996


XML stands for eXtensible Markup Language

We expect definition of XML to involve mostly changes to the SGML metalanguage, with perhaps some restrictions on DTDs or conventions about name usage in the absence of DTDs. Conventions for interchange and processing semantics may also be part of our work, but we expect that they will be addressed separately from the definition of XML.

List of Principles

This list of principles is presented in descending order of priority. We currently regard items #1 through #4 as "non-negotiable", required characteristics of any successful XML design.

1. XML shall be straightforwardly usable over the Internet.

This does not mean you can feed it to, for example, the Netscape of today, but that the design will have regard at all times to the needs of distributed applications working on large-scale networks.

2. XML shall support a wide variety of applications.

No design elements shall be adopted which would impair the usability of XML documents in other contexts such as print or CD-ROM, nor in applications other than network browsing. Examples of such applications include:

3. XML shall be compatible with SGML.

  1. Existing SGML tools will be able to read and write XML data.
  2. XML instances are SGML documents as they are, without changes to the instance.
  3. For any XML document, a DTD can be generated such that SGML will produce "the same parse" as would an XML processor.
  4. XML should have essentially the same expressive power as SGML.

Note: #1 and #2 describe our goal in its ideal form. If this goal is not achievable in its fullest form, then we may back out to a weaker form: it shall be simple to transform XML documents into equivalent SGML documents, and vice versa. Our intention, however, is to bite the bullet and ensure if we can that no transformation is needed to allow SGML tools to read and write XML document instances.

#3 and #4 indicate our intentions accurately, but it is not yet clear how best to formalize and explain the phrase "the same parse", or the phrase "essentially the same expressive power". These remain open questions; see point 8 also.

4. It shall be easy to write programs which process XML documents.

In particular, it shall be straightforwardly possible to construct useful XML applications which do not read, or need to read, the DTD of the XML document.

Note: For this purpose, "easy" means that the holder of a CS bachelor's degree should be able to construct basic processing (parsing, if not validating) machinery in less than a week, and that the major difficulty in the application should be the application-specific functions; XML should not add to the inherent difficulties of writing such applications.

5. The number of optional features in XML is to be kept to the absolute minimum, ideally zero.

As a result of this, any XML document has a high probability of being handled successfully by any XML processor.

6. XML documents should be human-legible and reasonably clear.

7. The XML design should be prepared quickly.

A first draft of the XML design should be ready for distribution and comment by end of 1996; a version should be ready for production use by the end of March 1997.

8. The design of XML shall be formal and concise.

XML should be simple and easy for implementors to grasp; its reference documentation should not exceed 20 pages, which should contain mostly formal grammar and very little normative text, if any. Note: normative text is not the same as descriptive or explanatory text.

XML shall specify clearly what characteristics of the input must be represented in the parse tree of an XML document, and what characteristics need not be captured by XML processors. This means the property sets "significant" in an XML application will be defined both formally and informally. Which properties are significant and which are insignificant remains an open question.

9. XML documents shall be easy to create.

It should be a straightforward task (though possibly labor-intensive) to create valid XML documents by hand (i.e. without a validating authoring tool).

It should be a straightforward task (though possibly labor-intensive) to create a validating XML authoring system.

10. Terseness is of minimal importance.

Minimizing keystrokes is not deemed important in achieving any of the above goals, but other things being equal a concise notation should be preferred to a verbose.