Related-Resource Discovery for XML
Tim Bray, September 1999

Many applications of XML are designed to process XML resources in combination with other related or supporting resources. Such resources currently include DTDs, stylesheets, RDF metadata, human-readable documentation, and executable code; many other types of related resource are in active development. In general, there is no standardized interoperable way for an XML resource to include information to aid applications in retrieving such related resources.

W3C recommendations currently provide syntax for XML documents to include pointers to DTDs and to stylesheets. These methods are ad-hoc, not compatible with each other, and represent nobody's idea of a general solution to the problem of retrieval of related resources.

Why the Problem is Important

The number of resource types that are potentially related to an XML resource is growing rapidly. Within a year or two, the list of types may reasonably be expected to include the following.

Schemas
Document Type Definitions, W3C XML Schemas, RDF Schemas, proprietary schemas such as Microsoft's XML-Data
Stylesheets
CSS (in its 1, 2, and 3 variants) and XSL.
Metadata
Packaged as RDF assertions in a potentially infinite number of vocabularies.
Transform Specifications
For now, XSLT.
Human-readable Documentation
In XML, HTML, PDF, or binary word processor formats.
Executable Code
Interpreted (Perl, Python, Javascript) or compiled (C, Java).
Graphics
In raster (PNG, GIF, JPG) and vector (SVG, VML, PGML) formats.
Hypertexts
Other resources which serve as targets or sources of XLink hyperlinks.

In many cases, it is valuable for an XML resource to aid applications using it find such related resources. In the absence of such aid, applications which need to use such resources will implement a selection of ad-hoc hardwired techniques for finding them, with no expectation of interoperability. It is undesirable for each application to embed idiosyncratic markup in the document to aid in this (as is the case with DTDs and stylesheets). It is even more undesirable for the applications to use built-in hardwired rules based on filenames, extensions, or the like.

The Solution Space

The following are desirable characteristics of a related-resource-discovery solution:

Network-Agnostic
The solution must work in a networked and local environment. That is to say, there can be no dependency on the use of any particular network protocol to aid in related-resource discovery. For this reason, it seems that content negotiation is an unlikely candidate for a solution.
Catalog and Package
Clearly it is necessary to provide pointers to remote related resources of various types. It is also desirable, though, to package up those resources and transmit them in one physical unit along with the "primary" document they relate to.
Consistent with Standard Practice
The solution should not invent any new syntaxes, declaration mechanisms, naming mechanisms, or linking mechanisms. The standards infrastructure is sufficiently robust to support the development of facilities to enable related-resource discovery.
Type-Agnostic
The number of related-resource types is already considerable and can be expected to grow without bound. Thus a related-resource discovery package must be able to accomodate an arbitrary number of types, and further be able to support the discovery of resources by type and of type by resource.
The Role of Schemas

Schemas are metadata resources, usually declarative in nature, that describe the organization, content, and (in a limited sense) meaning of an XML document. The widespread deployment of schema technology is expected to bring a substantial increase in the richness and utility of the Web as a whole.

However, across the space of applications, it is clearly the case that schemas are merely one type of related resource, whose importance to the other types is often high but always application-dependent.

The Role of Namespaces

XML Namespaces provide a mechanism for universalizing the names of XML elements and attributes by turning them into two-part objects, the added part being a URI. The combination of the URI and local name is guaranteed unique across the spectrum of applications and provides a robust and efficient means to group the universe of names into named vocabularies ("namespaces").

The fact that a namespace name is a Uniform Resource Identifier has led many to the not-unreasonable conclusion that it Identifies a Resource, and that this resource serves a definitional function for the namespace.

This was not the intention of the Working Group that produced the namespace specification, simply because there was no facility available that is able to specify the wide range of semantics that can be associated with a vocabulary. However, the notion that the namespace name ought to be usable for the retrieval of something is plausible and worth serious consideration.

For example, should a related-resource discovery facility take the form of an XML document containing links to some related resources and the content of others, it would be unsurprising to expect to use the namespace name to retrieve that.

Furthermore, in such an XML document, namespace names are attractive candidates for use in identifying the types of the related resources.

Conclusion

This problem is not in the future. Applications are already doing related-resource discovery today, and will do so at an increasing rate. If no interoperable, scalable method of doing this is available, there will be no interoperability or scalability in industry practice.