This document contains a set of findings representing the consensus of the W3C Technical Architecture Group (TAG), arrived at on Jan. 28, 2002. These findings were derive from discussion of TAG issues w3cMediaType-1, customMediaType-2, and nsMediaType-3 but in some cases extend beyond the specifics of the issue that was raised.
Media types are an important part of the web architecture; dispatching on them, when possible, is efficient and robust and well-understood.
W3C Working Groups engaged in defining a language should arrange for the registration of a media type for that language. The IETF registration forms should form part of the specification of the language, and should be in place as a condition of entry to Candidate Recommendation status.
The conventions and framework established by RFC 3023 should be followed when registering a media type for a language that uses XML syntax.
When processing XML documents, it is appropriate for Web applications to dispatch elements to modules for processing based on the namespace of the element type.
Correct dispatching and processing requires context - in general it is not reasonable nor safe to do namespace-based processing without knowledge of the namespace of ancestor elements. Because of this, the namespace of the root element of an XML document has special status and serves naturally as a basis for top-level software dispatching in the case where the dispatch information is not externally supplied.
It is acknowledged that there are exceptions to this rule, for example XSLT documents whose root element's namespace depends on the desired output from application of the XSLT.
The architecture of the Web depends on applications making dispatching and security decisions for resources based on their media types and other MIME headers. It is a serious error for the resource body to be inconsistent with the assertions made about it by the MIME headers. Web software should not attempt to recover from such errors by guessing, but should report the error to the user to allow intelligent corrective action.
An example of incorrect and dangerous behavior is a user-agent which reads
some part of the body of a resource and decides to treat it as HTML based on
its containing a
<!DOCTYPE
declaration or <title>
tag, when it
was served as text/plain
or some other non-HTML type.
Examples of such inconsistencies that have been observed on the Web include:
charset
parameter, but whose Unicode encoding
differs from that given in the
charset
.The first example in the preceding is a particularly troublesome case.
RFC 3023 gives rules for
when the charset
parameter should be used, and states that it
is always authoritative.
However, a receiving
application can with very high reliability determine the encoding of an XML
document by reading it, without reference to any external headers.
The consequence is that server-side applications should ensure that
for XML resources,
they supply a charset
header only where there is complete
certainty as to the encoding in use, since an error will cause a perfectly
usable resource to be rejected by an architecturally sound client.