<?XML DSD paper SYSTEM "paper.dsd">
<paper>
<title>MGML - an SGML Application for Describing Document Markup Languages</title>
<auths>
<auth></auth>
<auth>...</auth>
<auth>Tim Bray, Textuality</auth>
</auths>

<sec><title>Why MGML?</title>

<p>The Standard Generalized Markup Language <ref>Goldfarb</ref> is the most
fully developed specification of the use of descriptive markup languages for
electronic documents.  The idea of descriptive markup <ref>Coombs87</ref> is
simple and powerful, and in fact has proved to be a basic requirement for many
advanced information processing applications.</p>

<p>Unfortunately, the adoption of SGML has proved surprisingly difficult,
expensive and slow, given that the underlying ideas are simple and 
self-evidently good.  Some of the perceived reasons have included:</p>

<list style="num">

<item>The SGML standard itself <ref>ISO</ref> is large, complex, and difficult to 
understand.</item>

<item>The standard specifies several optional and advanced markup features, some
of which remain unimplemented.</item>

<item>Some of the features of SGML have proven counter-productive
in practical use.</item>

<item>Practical use of SGML requires learning several other languages,
including the language used to write DTD's, various stylesheeting and
formatting languages, and the SGML/Open Entity Catalogue language.</item>

<item>The design of SGML takes little account of the contemporary theory of
formal languages and finite automata.  One practical result is that SGML
parsers are unable to make use of some advanced tools and techniques made
possible by that theory.  Consequently,  they are large and complex pieces of
computer software; as such they (a) suffer from reliability problems, (b) have
in practice proven difficult to integrate into applications, and (c) change
slowly in response to advances in software and document processing
technology.</item>

</list>

<p>Nonetheless, there remains a consensus that SGML's basic design partition
into entities, elements, and attributes is correct and useful.  One result is
a common tendency, in strategic projects involving SGML, to avoid using a many
advanced features and operate within the bounds of a highly restricted subset.
This approach has generally met with success.  However, this restricted subset
has been re-invented by each successive group that has attacked the
problem.</p>

<p>It is our opinion that SGML exhibits an extreme case of the <qu>80-20
syndrome</qu>; that is to say, 80% of the benefit is gained by applying only
20% of the machinery.  It is the goal of
this project to formalize the definition of this useful subset, which we call
Minimal Generalized Markup Langugae, MGML.</p>

<p>The design goals are that MGML shall:

<list style="num">

<item>be a conforming SGML application, and process a proper subset of
SGML documents</item>

<item>provide full support for the basic mechanisms (entities, elements, and
attributes) which have made SGML successful</item>

<item>unify the syntax of the meta-langage and the generated languages (the DTD
and the instances)</item>

<item>be defined by a simple, compact, formal specification that allows the easy
implementation of MGML processors by taking advantage of standard
formal-language technology.</item>

<item>exclude those portions of the SGML design which impair ease of
understanding, use, and portability</item>

<item>maintain compatibility with successful high-profile applications of SGML
such as the TEI, XXX insert others XXX, and HTML.</item>

</list></p>

<p>The MGML Master DSD defines a total of 20 elements and 15 attributes.
In printed form, it occupies only XXX pages.  Its official version is
currently being maintained by XXX, and an electronic form may be obtained via
FTP at XXX.</p>

<p>Also in that directory is a reference parser, implemented as two lex
modules, one C module, and one yacc module, comprising 835 lines of code.</p>

</sec>

</sec>

<sec><title>The Specification of MGML</title>

<p>MGML is based on the Document Structure Definition (DSD).  A DSD is a set
of markup declarations that apply to all documents of a given structure.  The
required content and structure of a DSD are defined by the MGML Master DSD.
The behavior of a conforming MGML processor is defined in the list appearing
below in this document, and in commentary text attached to the structure
definitions in the MGML Master DSD.  These behavior specifications and the
MGML Master DTD together constitute the sole and complete definition of
MGML.</p>

<p>A conforming MGML processor shall:

<list style="num">

<item>Optionally, for any DSD, write a corresponding SGML declaration and SGML
Document Type Definition which define a class of documents including all those
accepted as valid by an MGML processor with respect to the DSD.  Thus, every
MGML document is an SGML document.</item>

<item>Scan the text of each element's content to distinguish markup and data.</item>

<item>Replace entity references by their entities.</item>

<item>Validate the element and attribute structure against the model described
by the DSD.</item>

<item>Supply all defaulted attributes.</item>

<item>Provide to an external processing system (a) complete information about
the entity, element, and attribute structure and (b) access to its 
content.</item>

</list></p>
</sec>
</paper>

