Tim Bray, February 2002
A namespace name is defined to be a URI reference. Some namespace names may be dereferenced; when this is done, the result is referred to as the namespace document.
In December 2000, I co-edited a proposal for
Resource Directory Definition Language,
an extension of XHTML with a <rddl:resource>
element, designed
for use in namespace documents.
This document is as an outline of the architectural principles
which led to the design of RDDL, although they were not thought through in
this level of detail at that time.
This document represents only the opinion of its author and has not been reviewed or approved by any other person or organization.
The only definitive document for namespaces with official standing is the W3C Recommendation Namespaces in XML, which explains that namespaces exist for the purpose of aiding software in recognizing markup vocabularies and avoiding collisions between identical names in different markup vocabularies. The existence of a namespace document is not necessary to achieve this purpose.
Furthermore, there are many widely-deployed namespaces (e.g. for Microsoft Office) whose namespace names are URNs and for which there exists no namespace document; these namespaces nonetheless effectively serve the purpose of enabling software to recognize markup and to avoid name collisions. This is an existence proof that the existence of namespace documents is not strictly necessary.
It is indicative that the Recommendation goes out of its way to assert that it is not a goal that a namespace name be directly usable to retrieve a schema. This may be read either as a general statement about the desirability of namespace documents, or a more specific statement as to the suitability of schemas for use as namespace documents.
In general, what can we conclude from the fact that an XML element is in a particular namespace? In particular, what can we conclude from the namespace of the root element of an XML document?
The answer varies widely from namespace to namespace. Some namespaces make strong claims as to the content and structure. Examples would include MathML and and SVG, which come with very complete descriptions of their allowed content and semantics.
Another class of namespaces includes XLink and that
hardwired to the prefix "xml:
", which define a small number
of attributes but allow great freedom
in the content of elements.
An interesting and common intermediate class is a namespace that makes some assertions about the syntax and semantics of its content, but is explicitly designed to allow the inclusion of content with markup from other namespaces. Perhaps the best-known example would be XHTML.
Every namespace currently in use has definitive material; it is difficult to
imagine a namespace for which there exists none.
In the case of the namespace hardwired to "xml:
",
the material consists
of a few paragraphs of text in a couple of W3C recommendations.
In the case of XHTML, it includes 3 DTDs and a large volume of
descriptive prose.
In the case of the Microsoft Office-related namespaces, it consists of a
quantity of documentation which may be found with considerable difficulty, but
not reliably bookmarked, in the bowels of MSDN.
Given that namespaces have definitive material, and that such definitive
material is typically available on the Web, and that namespace names may be
"http:
"-class URIs, it is a grievous waste of potential if it is
not possible to use the namespace name in retrieving the definitive
material.
More prosaically, the fact that many namespace names are
"http:
"-class URLs leads to a widely-held expectation that they
point to something, i.e. that there is a namespace document.
Early efforts to dispel this expectation have universally failed and led to
angst and confusion.
In fact, when you use something whose most common application is retrieval,
and you use it to identify something which has supporting definitive material,
it is wilfully perverse not to connect the retrieval function with the
supporting material.
The argument in support of this thesis is given by reference.
Given that namespace documents are a desirable thing, and given that at the present time, URNs are not effectively usable in the general population for retrieval of resources, URNs are not appropriate for use as namespace names.
"Definitive material" is a very broad term.
Since semantics typically are built on a platform of syntax, basic
definitive material usually consists of syntax constraints on the markup
vocabulary identified by a namespace.
This can take the form of a formal schema language (examples include DTDs, XML
Schemas, Microsoft XDR, and Relax-NG), a set of grammar productions such as
those that govern the values of xml:lang
and
xml:base
, or even human-readable text such as for the value of
xml:space
.
For applications of XML whose semantics are mostly presentational, definitive material includes rendition specifications; obvious examples include HTML and SVG. This definitive material is mostly rendered in human-readable text, with a certain amount of supporting mathematics and diagramming.
In other cases, rendition specifications can be given in machine-readable form, for example as a CSS stylesheet.
In the context of modern computing technologies such as Java, .NET, and the whole spectrum of Web Services, the definitive material for a namespace can be expected to include classes, methods, and other forms of executable code that perform some validation, rendition, or other business function using that markup as input.
It is hoped that at some future point a higher proportion of definitive material can be captured in a semantically rich machine-readable form, presumably based on RDF or something like it. At the current time, the proportion of such definitive material is effectively zero. There is every reason to expect that for the foreseeable future, a certain proportion of definitive material will continue to exist in human-readable form, in the specification of rendition, and in executable code.
Thus, a namespace for which all the definitive material can be found in a single resource is an anomalous and uninteresting special case. There is one exception: a relatively "lightweight" namespace created ad-hoc to label some work-in-progress, for which the definitive material is merely a couple of paragraphs of explanation, or a statement such as "this is used for the functions in the Query API." In other words, in the case where the definitive material appears in one resource, it tends to consist of human-readable material.
XHTML has at least three DTDs. It is normal for multiple versions of definitive materials for a language to exist, to select among dialects and versions. Content-negotiation may be very helpful but is not in the general case credible as a solution for selecting among definitive resources.
The argument is based on the theses above:
Therefore the namespace document should provide a level of indirection, ideally serving as a directory.
As noted above, there is no requirement that namespace URIs be dereferenced for namespaces to function as specified in their definitive material; they are simply names.
At a pragmatic level, there are many Web applications where, for a variety of reasons including performance, it is desirable that data objects be self-contained and processable without requiring the retrieval of supporting documents. The success of XML as opposed to SGML is partly due to its dropping the requirement that a document always be processed with its DTD.
This is a very common case - an application that knows how to deal with some markup vocabulary typically needs only to recognize the markup it knows.
It is easy to imagine applications that rely heavily on run-time processing of of supporting resources - an obvious example is a CSS stylesheet - but hard to defend the proposition that this is a dominant mode of operation and especially hard to argue that they are retrieved via namespace name, given the total lack of support from the Namespace Recommendation.
A key differentiating factor between the Web and most information systems
that came before it is that anyone can, and many people do, write software to
process data designed and produced by someone else.
An advantage of descriptive markup - to my mind,
Thus, it is an important design goal that a namespace's definitive material aid in the process of empowering everyone to write software to process Web resources.
A common use of namespaces - the only one that is blessed by the Namespaces Recommendation - is to enable software to recognize markup it knows how to deal with. Such software is written by two classes of person:
The second class is an important audience for a namespace's definitive material.
Furthermore, there is a large class of semantics for which there
is no way to write a machine-readable description.
Some of the syntax of, constraints on, and desired behavior in processing markup
can
Finally, given that namespace documents should contain definitive material, and that they are frequently not dereferenced at run-time, it follows that they are in many cases used at design time. Designers are mostly humans.
Thus human-readable technical documentation is an important component of a namespace's definitive material.
One of the main advantages of the Web architecture is its extreme
generality - the same mechanisms successfully support retrieval and
processing of resources by humans for display, by schema software for
syntax validation, and by e-commerce software for executing transactions.
The word "schema" is widely used to refer to collections of declarative constraints on the syntax, structure, and content of resources written in some specific formal language. Examples include DTDs, Microsoft XDR, XML Schemas, and Relax NG.
The chief application of this type of schema is support of validation (constraint-checking) applications. The software which processes them tends to be large, complex, and chiefly concerned with validation and error reporting. Therefore, using a such a resource as a namespace document implies a prejudice in favor of validation-class applications.
Even if it were desirable in principle to use this class of resource as a namespace document, the question would arise: which one? There are a wide variety of languages available, including 3 different ones blessed by the W3C. Furthermore it is typical to use multiple versions of a schema document in one of these languages.
For all these reasons, the use of a this type of resource as a namespace document is not architecturally sound.