Draft W3C TAG Findings of January 28, 2001

This document contains a set of findings representing the consensus of the W3C Technical Architecture Group (TAG), arrived at on Jan. 28, 2002. These findings were derive from discussion of TAG issues w3cMediaType-1, customMediaType-2, and nsMediaType-3 but in some cases extend beyond the specifics of the issue that was raised.

Media Types

Media types are an important part of the web architecture; dispatching on them, when possible, is efficient and robust and well-understood.

W3C Working Groups engaged in defining a language should arrange for the registration of a media type for that language. The IETF registration forms should form part of the specification of the language, and should be in place as a condition of entry to Candidate Recommendation status.

The conventions and framework established by RFC 3023 should be followed when registering a media type for a language that uses XML syntax.

Namespace-Based Dispatching

When processing XML documents, it is appropriate for Web applications to dispatch elements to modules for processing based on the namespace of the element type.

Correct dispatching and processing requires context - in general it is not reasonable nor safe to do namespace-based processing without knowledge of the namespace of ancestor elements. Because of this, the namespace of the root element of an XML document has special status and serves naturally as a basis for top-level software dispatching in the case where the dispatch information is not externally supplied.

It is acknowledged that there are exceptions to this rule, for example XSLT documents whose root element's namespace depends on the desired output from application of the XSLT.

Consistency of Media Types and Resource Contents

The architecture of the Web depends on applications making dispatching and security decisions for resources based on their media types and other MIME headers. It is a serious error for the resource body to be inconsistent with the assertions made about it by the MIME headers. Web software should not attempt to recover from such errors by guessing, but should report the error to the user to allow intelligent corrective action.

An example of incorrect and dangerous behavior is a user-agent which reads some part of the body of a resource and decides to treat it as HTML based on its containing a <!DOCTYPE declaration or <title> tag, when it was served as text/plain or some other non-HTML type.

Examples of such inconsistencies that have been observed on the Web include:

A resource which is an XML document whose MIME headers contain a charset parameter, but whose Unicode encoding differs from that given in the charset.
A resource which is an XML document where the media type header is inconsistent with the namespace of the root element.

Consistency in Communicating Character Encoding

The first example in the preceding is a particularly troublesome case. RFC 3023 gives rules for when the charset parameter should be used, and states that it is always authoritative. However, a receiving application can with very high reliability determine the encoding of an XML document by reading it, without reference to any external headers. The consequence is that server-side applications should ensure that for XML resources, they supply a charset header only where there is complete certainty as to the encoding in use, since an error will cause a perfectly usable resource to be rejected by an architecturally sound client.

Draft W3C TAG Findings of January 28, 2001

Introduction

Media Types

Namespace-Based Dispatching

Consistency of Media Types and Resource Contents

Consistency in Communicating Character Encoding