Architectural Theses on Namespaces and Namespace Documents

Tim Bray, February 2002

A namespace name is defined to be a URI reference. Some namespace names may be dereferenced; when this is done, the result is referred to as the namespace document.

In December 2000, I co-edited a proposal for Resource Directory Definition Language, an extension of XHTML with a <rddl:resource> element, designed for use in namespace documents. This document is as an outline of the architectural principles which led to the design of RDDL, although they were not thought through in this level of detail at that time.

This document represents only the opinion of its author and has not been reviewed or approved by any other person or organization.

1. It is not strictly necessary for namespace documents to exist.

The only definitive document for namespaces with official standing is the W3C Recommendation Namespaces in XML, which explains that namespaces exist for the purpose of aiding software in recognizing markup vocabularies and avoiding collisions between identical names in different markup vocabularies. The existence of a namespace document is not necessary to achieve this purpose.

Furthermore, there are many widely-deployed namespaces (e.g. for Microsoft Office) whose namespace names are URNs and for which there exists no namespace document; these namespaces nonetheless effectively serve the purpose of enabling software to recognize markup and to avoid name collisions. This is an existence proof that the existence of namespace documents is not strictly necessary.

It is indicative that the Recommendation goes out of its way to assert that it is not a goal that a namespace name be directly usable to retrieve a schema. This may be read either as a general statement about the desirability of namespace documents, or a more specific statement as to the suitability of schemas for use as namespace documents.

2. Namespaces vary widely in semantic effect.

In general, what can we conclude from the fact that an XML element is in a particular namespace? In particular, what can we conclude from the namespace of the root element of an XML document?

The answer varies widely from namespace to namespace. Some namespaces make strong claims as to the content and structure. Examples would include MathML and and SVG, which come with very complete descriptions of their allowed content and semantics.

Another class of namespaces includes XLink and that hardwired to the prefix "xml:", which define a small number of attributes but allow great freedom in the content of elements.

An interesting and common intermediate class is a namespace that makes some assertions about the syntax and semantics of its content, but is explicitly designed to allow the inclusion of content with markup from other namespaces. Perhaps the best-known example would be XHTML.

3. Namespaces have definitive material.

Every namespace currently in use has definitive material; it is difficult to imagine a namespace for which there exists none. In the case of the namespace hardwired to "xml:", the material consists of a few paragraphs of text in a couple of W3C recommendations. In the case of XHTML, it includes 3 DTDs and a large volume of descriptive prose. In the case of the Microsoft Office-related namespaces, it consists of a quantity of documentation which may be found with considerable difficulty, but not reliably bookmarked, in the bowels of MSDN.

4. It is good for namespace documents to exist.

Given that namespaces have definitive material, and that such definitive material is typically available on the Web, and that namespace names may be "http:"-class URIs, it is a grievous waste of potential if it is not possible to use the namespace name in retrieving the definitive material.

More prosaically, the fact that many namespace names are "http:"-class URLs leads to a widely-held expectation that they point to something, i.e. that there is a namespace document. Early efforts to dispel this expectation have universally failed and led to angst and confusion. In fact, when you use something whose most common application is retrieval, and you use it to identify something which has supporting definitive material, it is wilfully perverse not to connect the retrieval function with the supporting material.

5. Namespace names should not be relative URI references.

The argument in support of this thesis is given by reference.

6. Namespace names should not be URNs.

Given that namespace documents are a desirable thing, and given that at the present time, URNs are not effectively usable in the general population for retrieval of resources, URNs are not appropriate for use as namespace names.

7. The definitive material for a namespace is normally distributed among multiple resources.

"Definitive material" is a very broad term. Since semantics typically are built on a platform of syntax, basic definitive material usually consists of syntax constraints on the markup vocabulary identified by a namespace. This can take the form of a formal schema language (examples include DTDs, XML Schemas, Microsoft XDR, and Relax-NG), a set of grammar productions such as those that govern the values of xml:lang and xml:base, or even human-readable text such as for the value of xml:space.

For applications of XML whose semantics are mostly presentational, definitive material includes rendition specifications; obvious examples include HTML and SVG. This definitive material is mostly rendered in human-readable text, with a certain amount of supporting mathematics and diagramming.

In other cases, rendition specifications can be given in machine-readable form, for example as a CSS stylesheet.

In the context of modern computing technologies such as Java, .NET, and the whole spectrum of Web Services, the definitive material for a namespace can be expected to include classes, methods, and other forms of executable code that perform some validation, rendition, or other business function using that markup as input.

It is hoped that at some future point a higher proportion of definitive material can be captured in a semantically rich machine-readable form, presumably based on RDF or something like it. At the current time, the proportion of such definitive material is effectively zero. There is every reason to expect that for the foreseeable future, a certain proportion of definitive material will continue to exist in human-readable form, in the specification of rendition, and in executable code.

Thus, a namespace for which all the definitive material can be found in a single resource is an anomalous and uninteresting special case. There is one exception: a relatively "lightweight" namespace created ad-hoc to label some work-in-progress, for which the definitive material is merely a couple of paragraphs of explanation, or a statement such as "this is used for the functions in the Query API." In other words, in the case where the definitive material appears in one resource, it tends to consist of human-readable material.

8. Content-negotiation is not a sufficiently powerful tool for selecting definitive-material resources.

XHTML has at least three DTDs. It is normal for multiple versions of definitive materials for a language to exist, to select among dialects and versions. Content-negotiation may be very helpful but is not in the general case credible as a solution for selecting among definitive resources.

9. Namespace documents should provide a level of indirection.

The argument is based on the theses above:

  1. It should be possible to use the namespace name to retrieve definitive material regarding a namespace.
  2. Such definitive material is usually found in more than one resource.
  3. Content-negotation is an insufficiently powerful tool for selecting among definitive resources.

Therefore the namespace document should provide a level of indirection, ideally serving as a directory.

10. Namespace names are frequently not dereferenced at run-time.

As noted above, there is no requirement that namespace URIs be dereferenced for namespaces to function as specified in their definitive material; they are simply names.

At a pragmatic level, there are many Web applications where, for a variety of reasons including performance, it is desirable that data objects be self-contained and processable without requiring the retrieval of supporting documents. The success of XML as opposed to SGML is partly due to its dropping the requirement that a document always be processed with its DTD.

This is a very common case - an application that knows how to deal with some markup vocabulary typically needs only to recognize the markup it knows.

It is easy to imagine applications that rely heavily on run-time processing of of supporting resources - an obvious example is a CSS stylesheet - but hard to defend the proposition that this is a dominant mode of operation and especially hard to argue that they are retrieved via namespace name, given the total lack of support from the Namespace Recommendation.

11. Anyone should be able to write software to process a Web resource.

A key differentiating factor between the Web and most information systems that came before it is that anyone can, and many people do, write software to process data designed and produced by someone else. An advantage of descriptive markup - to my mind, the key advantage - is that it allows people to put data to use in ways not intended or envisioned by its creator.

Thus, it is an important design goal that a namespace's definitive material aid in the process of empowering everyone to write software to process Web resources.

12. Namespace documents should be human-readable.

A common use of namespaces - the only one that is blessed by the Namespaces Recommendation - is to enable software to recognize markup it knows how to deal with. Such software is written by two classes of person:

The second class is an important audience for a namespace's definitive material.

Furthermore, there is a large class of semantics for which there is no way to write a machine-readable description. Some of the syntax of, constraints on, and desired behavior in processing markup can only be expressed in human language.

Finally, given that namespace documents should contain definitive material, and that they are frequently not dereferenced at run-time, it follows that they are in many cases used at design time. Designers are mostly humans.

Thus human-readable technical documentation is an important component of a namespace's definitive material.

13. Namespace documents should not favor the needs of any one application or application class.

One of the main advantages of the Web architecture is its extreme generality - the same mechanisms successfully support retrieval and processing of resources by humans for display, by schema software for syntax validation, and by e-commerce software for executing transactions. None of these functions is architecturally any more important than any other. Therefore, the nature of the namespace document should not favor its use in any particular application class.

14. Namespace documents should not be "schemas".

The word "schema" is widely used to refer to collections of declarative constraints on the syntax, structure, and content of resources written in some specific formal language. Examples include DTDs, Microsoft XDR, XML Schemas, and Relax NG.

The chief application of this type of schema is support of validation (constraint-checking) applications. The software which processes them tends to be large, complex, and chiefly concerned with validation and error reporting. Therefore, using a such a resource as a namespace document implies a prejudice in favor of validation-class applications.

Even if it were desirable in principle to use this class of resource as a namespace document, the question would arise: which one? There are a wide variety of languages available, including 3 different ones blessed by the W3C. Furthermore it is typical to use multiple versions of a schema document in one of these languages.

For all these reasons, the use of a this type of resource as a namespace document is not architecturally sound.