<show>
<head>
<showTitle>XML and Perl</showTitle>
<showLoc>The Perl Conference 2.0</showLoc>
</head>
<slide>
<title>Road Map</title>
<point>What is markup?</point>
<point>What is XML?</point>
<point>How does Unicode work?</point>
<point>Why should you use XML?</point>
<point>How can you use XML in Perl?</point>
</slide>
<slide>
<title>Who Am I?</title>
<point>Failed poet &amp; old database engineer</point>
<point>1986-89: New <i>Oxford English Dictionary</i> project</point>
<point>1989-96: Open Text, co-founder through IPO</point>
<point>Now: Independent, co-editor XML 1.0 spec, technical editor of XML.com, Seybold Fellow</point>
<point>tbray@textuality.com, +1-604-708-9592</point>
</slide>
<slide>
<title>Where To Get This Talk</title>
<bigURL>http://www.textuality.com/talks/px</bigURL>
</slide>
<slide><bigTitle>What is Markup?</bigTitle>
</slide>
<slide><title>Three Kinds of Markup</title>
<point>Presentational Markup</point>
<point>Procedural Markup</point>
<point>Descriptive Markup</point>
</slide>
<slide><title>No Markup</title>
<eg size="100">THREEKINDSOFMARKUPPR
ESENTATIONALMARKUPPR
OCEDURALMARKUPDESCRI
PTIVEMARKUP         
</eg>
</slide>
<slide><title>Procedural Markup</title>
<eg size="15">{\f1\fs48 Three Kinds of Markup\par }\pard \nowidctlpar
\widctlpar\adjustright {\f1 \par {\pntext\pard\plain\f1
\fs28\cgrid \hich\af1\dbch\af0\loch\f1 1.\tab}}\pard 
\fi-360\li360\nowidctlpar\widctlpar\jclisttab\tx360
{\*\pn \pnlvlbody\ilvl0\ls1\pnrnot0\pndec\pnstart1
\pnindent360\pnhang{\pntxta .}}\ls1\adjustright 
{\f1\fs28 Presentational \par {\pntext\pard\plain\f1\fs28
\cgrid \hich\af1\dbch\af0\loch\f1 2.\tab}}\pard \fi-360
\li360\nowidctlpar\widctlpar\jclisttab\tx360{\*\pn 
\pnlvlbody\ilvl0\ls1\pnrnot0\pndec\pnstart1\pnindent360 
\pnhang{\pntxta .}}\ls1\adjustright {\f1\fs28 Procedural
\par {\pntext\pard\plain\f1\fs28\cgrid \hich\af1\dbch\af0
\loch\f1 3.\tab}}\pard \fi-360\li360\nowidctlpar\widctlpar
\jclisttab\tx360{\*\pn \pnlvlbody\ilvl0\ls1\pnrnot0\pndec
\pnstart1\pnindent360\pnhang{\pntxta .}}\ls1\adjustright 
{\f1\fs28 Descriptive}{ \f1\fs28 \par }}</eg>
</slide>
<slide>
<title>Descriptive Markup</title>
<eg size="40"><![CDATA[<slide>
<title>Three Kinds of Markup</title>
<point>Presentational Markup</point>
<point>Procedural Markup</point>
<point>Descriptive Markup</point>
</slide>]]></eg>
</slide>
<slide><title>On Markup</title>
<point>In the old days, (troff/TeX), markup was procedural and overt</point>
<point>In modern WP and DTP, markup is procedural and hidden</point>
<point>WYSIWYG is a lie!</point>
</slide>
<slide><title>Why Descriptive Markup is Good For You</title>
<point>Can repurpose for multiple uses</point>
<point>Can do clever search &amp; retrieval</point>
<point>You own it, not your vendor</point>
</slide>
<slide><bigTitle>What is XML?</bigTitle>
</slide>
<slide><title>This Isn't XML</title>
<eg size="50"><![CDATA[<TITLE>Suite in F</title>
This work of J.S.&nbsp;Bach is written to be played, not on the modern cello, but on the much softer-toned <i>viola da gamba</i>.
<p>An italian specimen of 1532:
<IMG src=vdg.jpg>]]></eg>
<point>no top-level element</point>
<point>&lt;TITLE> doesn't match &lt;/title></point>
<point>what's &amp;nbsp?</point>
<point>what does &lt;i> mean?</point>
<point>neither &lt;p> nor &lt;IMG> have an end-tag.</point>
<point>no quotes on "vdg.jpg"</point>
</slide>
<slide><title>This is Well-Formed XML</title>
<eg size="50"><![CDATA[<HTML>
<title>Suite in F</title>
This work of J.S.&#xa0;Bach is written to be played, not on the modern cello, but on the much softer-toned <Instrument>viola da gamba</Instrument>.
<p>An italian specimen of 1532: 
<IMG src='vdg.jpg' /></p></HTML>]]></eg>
</slide>
<slide><title>This is Valid XML</title>
<eg size="33" xml:space="preserve"><![CDATA[<!DOCTYPE HTML 
[ <!ELEMENT HTML       (title, p+)>
  <!ELEMENT title      (#PCDATA)>
  <!ELEMENT p          (#PCDATA|Instrument|IMG)*>
  <!ELEMENT Instrument (#PCDATA)>
  <!ELEMENT IMG        EMPTY>
  <!ATTLIST IMG        src CDATA #REQUIRED>
  <!ENTITY  nbsp       "&#xa0;"> ]>
<HTML><title>Suite in F</title>
<p>This work of J.S.&nbsp;Bach is written to be played, 
not on the modern cello, but on the much softer-toned 
<Instrument>viola da gamba</Instrument>.</p>
<p>An italian specimen of 1532: 
<IMG src='vdg.jpg' />
</p></HTML>]]></eg>
</slide>
<slide><title>Valid XML, but Better</title>
<eg size="40"><![CDATA[<!DOCTYPE HTML SYSTEM "html.dtd">
<HTML><title>Suite in F</title>
<p>This work of J.S.&nbsp;Bach is written to be played, not on the modern cello, but on the much softer-toned <Instrument>viola da gamba</Instrument>.</p>
<p>An italian specimen of 1532: <IMG src='vdg.jpg' />
</p></HTML>]]></eg>
</slide>
<slide><title>XML: History and Politics</title>
<point>In 1986, ISO approved an international standard for descriptive markup,
named SGML</point>
<point>In 1996, HTML was running out of steam...</point>
<point>... it looked like SGML had some of the answers...</point>
<point>... but SGML has technical and political problems...</point>
<point>SGML - (arcane features) + (new acronym) = XML!</point>
</slide>
<slide><title>XML: Design Goals</title>
<ol>
<point>XML shall be straightforwardly usable over the Internet</point>
<point>XML shall support a wide variety of applications</point>
<point>XML shall be compatible with SGML</point>
<point>It shall be easy to write programs which process XML documents</point>
<point>The number of optional features in XML is to be kept to the absolute minimum, ideally zero</point>
<point>XML documents should be human-legible and reasonably clear</point>
<point>The XML design should be prepared quickly</point>
<point>The design of XML shall be formal and concise</point>
<point>XML documents shall be easy to create</point>
<point>Terseness in XML markup is of minimal importance</point>
</ol>
</slide>
<slide>
<title>XML From 50,000 Feet</title>
<point>A meta-language for descriptive markup: you invent your own tags</point>
<point>Small spec: &lt; 40 pages</point>
<point>Built-in internationalization via Unicode</point>
<point>Built-in error-handling</point>
<point>Optimized for network operations</point>
<point>Tons of support from "the big boys."</point>
</slide>
<slide><title>Where to Find Out About XML</title>
<point>http://www.w3.org/xml</point>
<point>http://www.xml.com</point>
<point>http://www.sil.org/sgml/xml.html</point>
</slide>
<slide><title>XML Terminology 1</title>
<eg size='33' xml:space="preserve"><![CDATA[<!DOCTYPE show
 [ <!ENTITY home "http://www.xml.com">
   <!ENTITY W3C  "World Wide Web Consortium"> ]>
<show>
<p>A commentary on the &W3C;'s XML spec is at
<link href="&home;/xml/pub/axml">XML.com</link>
</p><p>Check it out!</p></show>]]></eg>
<point><tt>home</tt> and <tt>W3C</tt> are <i>entities</i></point>
<point><tt>&amp;home;</tt> and <tt>&amp;W3C;</tt> are <i>entity
references</i></point>
<point>There are four <i>elements</i>, of three <i>element types</i>:
<tt>show</tt>, <tt>link</tt>, and <tt>p</tt></point>
<point>There is one <i>attribute</i>, whose name is <tt>href</tt> and whose
value is <tt>http://www.xml.com/xml/pub/axml</tt></point>
</slide>
<slide><title>XML Terminology 2</title>
<point>The XML spec defines <i>XML Document</i> and <i>XML Processor</i></point>
<point>An <i>XML Document</i> is anything that's "well-formed"</point>
<point>An <i>XML Processor</i> is a piece of software that reads XML on behalf
of an <i>application</i></point>
</slide>
<slide><bigTitle>How Does Unicode Work?</bigTitle>
</slide>
<slide><title>The Unicode Spectrum</title>
<pic href="Unicode.gif"/>
<point>Unicode == ISO 10646</point>
<point>38,886 16-bit characters (20,902 CJK)</point>
<point>Every character ever available with a computer</point>
<point>1 million "surrogate" characters</point>
</slide>
<slide><title>Unicode for Programmers</title>
<point>16-bit formats: UTF-16 and UCS-2 (wchar_t in C, char in Java)</point>
<point>8-bit format: UTF-8 (char in C)</point>
<point>Perl currently uses UTF-8 internally, can read UTF-16, ASCII,
ISO-8859-1, and UTF-8</point>
<point>Go to www.unicode.org and buy the book!</point>
</slide>
<slide><title>Unicode and XML</title>
<eg size="40" xml:space='preserve'>&lt;?xml version="1.0" encoding="ISO-8859-1" ?></eg>
<point>XML processors required to read UTF-8 and UTF-16!</point>
<point>Unfortunately, there's not much out there...</point>
<point>... but ASCII, EBCDIC, JIS, KO18-R, Big5, etc. are all full of Unicode
characters ...</point> 
<point>... so they are legal XML too ...</point>
<point>... but you have to tell the processor!</point>
</slide>
<slide>
<bigTitle>Why Should You Use XML?</bigTitle>
</slide>
<slide><bigTitle>Recently I Set Up a Linux Box</bigTitle></slide>
<slide><title>Recently I Set up a Linux Box</title>
<eg size="33" xml:space="preserve">Section "Pointer"
    Protocol    "MouseMan"
    Device      "/dev/mouse"
# When using XQUEUE, comment out the above two lines, 
# and uncomment the following line.
#    Protocol   "Xqueue"

# ... parts left out

# ChordMiddle is an option for some 3-button Logitech mice
    ChordMiddle
EndSection</eg>
<caption>From XFree86Config</caption>
</slide>
<slide><title>Recently I Set Up a Linux Box</title>
<eg size="33" xml:space="preserve">boot=/dev/hda
map=/boot/map
install=/boot/boot.b
prompt
timeout=100
other=/dev/hda1
        label=Win95
        table=/dev/hda
image=/boot/vmlinuz
        label=linux
        root=/dev/hda3
        read-only
</eg>
<caption>lilo.conf</caption>
</slide>
<slide><title>Recently I Set Up a Linux Box</title>
<eg size="33" xml:space="preserve">[homes]
   comment = Home Directories
   browseable = yes
   read only = no
   create mode = 0750

[printers]
   comment = All Printers
   browseable = no
   printable = yes
   public = no
</eg>
<caption>From smb.conf</caption>
</slide>
<slide><title>Recently I Set Up a Linux Box</title>
<eg size="18" xml:space='preserve'><![CDATA[#
# These are standard services.
#
ftp    stream  tcp nowait root /usr/sbin/tcpd  in.ftpd -l -a
telnet stream  tcp nowait root /usr/sbin/tcpd  in.telnetd
gopher stream  tcp nowait root /usr/sbin/tcpd  gn]]></eg>
<caption>From inetd.conf</caption>
</slide>
<slide><title>Recently I Set Up a Linux Box</title>
<eg size="24" xml:space='preserve'><![CDATA[OpaqueMoveSize 100
...
Style "Fvwm*"        NoTitle, Sticky, WindowListSkip
Style "Fvwm Pager"   StaysOnTop, NoHandles
Style "FvwmBanner"   StaysOnTop
...
AddToFunc "Move-or-Raise" "M" Move
+                         "M" Raise
+                         "C" Raise
+                         "D" Maximize 100 100
...
Mouse 1   F   A    Function "Resize-or-Raise"
...
*FvwmButtons(Title xfm, Icon Xfm.xpm, \
      Action 'Exec "Xfm" xfm -title "File Manager" &')]]></eg>
<caption>From fvwm2rc95</caption>
</slide>
<slide><title>About Syntax</title>
<point>Syntax is boring</point>
<point>Inventing syntax is a waste of time</point>
<point>Writing code to parse your own syntax is a waste of time</point>
<point>Learning a new syntax for each configuration file is a waste of
time</point>
<point>So stop wasting time and leave the syntax to XML!</point>
</slide>
<slide><title>Why is the Web So Slow?</title>
<point>Browser task: render HTML</point>
<point>Server tasks: full-text search, database apps, session management,
network interface, template processing, etc. etc. etc.</point>
<point>To make the Web faster, run more code in the browser!</point>
</slide>
<slide><title>Today's Browser Architecture</title>
<pic href='BWall.gif' />
</slide>
<slide><title>The Document Object Model</title>
<intro>Replaces "Dynamic HTML"</intro>
<point>Language-independent</point>
<point>Browser-independent</point>
<point>OS-independent</point>
</slide>
<slide><title>The Next-Gen Browser</title>
<pic href='DOM.gif' />
</slide>
<slide><title>Metadata Model + XML Syntax = RDF</title>
<point>Commercial MIS systems are largely metadata-driven</point>
<point>The Web has no metadata - hence brute-force web robots</point>
<point>Coming soon from the W3C, Resource Description Framework (RDF): simple
data model, XML syntax, let 100 vocabularies bloom</point>
</slide>
<slide><title>Who Loves XML?</title>
<point>XML's attractiveness to a product vendor is in inverse proportion to
their market share</point>
<point>XML's attractiveness to people who just want pretty pages is not very
high</point>
<point>XML's attractiveness to people who invest a lot in creating information
is overwhelming</point>
</slide>
<slide><bigTitle>How Can You Use XML in Perl?</bigTitle>                   
</slide>
<slide><title>The expat XML Processor</title>
<pic href='p-expat.gif' />
<point><i>expat</i> written in C by James Clark</point>
<point>Blindingly fast</point>
<point>Stream-based callback API</point>
</slide>
<slide><title>The XML::Parser Package</title>
<point>No reliance in principle on <i>expat</i></point>
<point>Can send raw <i>expat</i> events to your own modules</point>
<point>Comes with a range of prepackaged handlers, invoked by the
<tt>Style</tt> argument</point>
<point>XML::Parser is best-seen as a testbed for constructing APIs</point>
</slide>
<slide><title>Some Test Data</title>
<eg size="33"><![CDATA[<?xml version="1.0"?>
<tstmt><ttitle>The Old Testament</ttitle>
<fm>
<p>Source of original ASCII files unknown.</p>
<p>SGML markup by Jon Bosak, 1992-1994.</p>
<p>XML version by Jon Bosak, 1996-1998.</p>
<p>This work may be freely distributed internationally.</p>
</fm>
<book title="Genesis">
<bktlong>The First Book of Moses, Called GENESIS.</bktlong>
<chapter n="1">
<v n="1"><p>In the beginning God created the heaven and the earth.
</p></v>
</chapter></book></tstmt>]]></eg>
</slide>
<slide><title>The XML::Parser "Debug" Style</title>
<eg size='33' xml:space='preserve'><![CDATA[use XML::Parser;

$p = new XML::Parser Style => 'debug';

parsefile $p 'Beginning.xml';
=========================================
tstmt \\ ()
tstmt ttitle \\ ()
tstmt ttitle || The Old Testament
tstmt ttitle //
tstmt || #10;
tstmt fm \\ ()
tstmt fm || #10;
tstmt fm p \\ ()
tstmt fm p || Source of original ASCII files unknown.
tstmt fm p //
tstmt fm || #10;
tstmt fm p \\ ()
tstmt fm p || SGML markup by Jon Bosak, 1992-1994.
tstmt fm p //
tstmt fm || #10;
tstmt fm p \\ ()
tstmt fm p || XML version by Jon Bosak, 1996-1998.
tstmt fm p //
tstmt fm || #10;
tstmt fm p \\ ()
tstmt fm p || This work may be freely distributed internationally.
tstmt fm p //
tstmt fm || #10;
tstmt fm //
tstmt || #10;
tstmt book \\ (title Genesis)
tstmt book || #10;
tstmt book bktlong \\ ()
tstmt book bktlong || The First Book of Moses, Called GENESIS.
tstmt book bktlong //
tstmt book || #10;
tstmt book chapter \\ (n 1)
tstmt book chapter || #10;
tstmt book chapter v \\ (n 1)
tstmt book chapter v p \\ ()
tstmt book chapter v p || In the beginning God created the heaven and the earth.
tstmt book chapter v p || #10;
tstmt book chapter v p //
tstmt book chapter v //
tstmt book chapter || #10;
tstmt book chapter //
tstmt book || #10;
tstmt book //
tstmt || #10;
tstmt //]]></eg>
</slide>
<slide><title>The XML::Parser "subs" Style</title>
<point>For each start-tag <tt>&lt;foo></tt>, calls <tt>sub foo</tt></point>
<point>For each end-tag <tt>&lt;/foo></tt>, calls <tt>sub foo_</tt></point>
<point>For each chunk of text, calls <tt>sub characters</tt></point>
<point>Maintains element stack in <tt>@Context</tt></point>
</slide>
<slide><title>The XML::Parser "subs" Style</title>
<eg size="33"><![CDATA[use XML::Parser;

$p = new XML::Parser Style => 'subs';

parsefile $p 'Beginning.xml';

sub p { print "@{$p->{Context}}\n"; }
sub characters { print "$_[1]"; }
=========================================
Text: The Old TestamentText: 
Text: 
Para: tstmt fm
Text: Source of original ASCII files unknown.
Text: 
Para: tstmt fm
Text: SGML markup by Jon Bosak, 1992-1994.
Text: 
Para: tstmt fm
Text: XML version by Jon Bosak, 1996-1998.
Text: 
Para: tstmt fm
Text: This work may be freely distributed internationally.
Text: 
Text: 
Text: 
Text: The First Book of Moses, Called GENESIS.
Text: 
Text: 
Para: tstmt book chapter v
Text: In the beginning God created the heaven and the earth.
Text: 
Text: 
Text: 
Text:]]></eg>
</slide>
<slide><title>The XML::Parser "tree" Style</title>
<eg size="33" xml:space="preserve"><![CDATA[use XML::Parser;

$p = new XML::Parser Style => 'tree';

parsefile $p 'Beginning.xml';

require 'dumpvar.pl';
dumpvar('main', 'p');
=========================================
$p = XML::Parser=HASH(0x80fb144)
   'Parser' => 135366672
   'Pkg' => 'main'
   'RawEvents' => 'XML::Parser::Tree'
   'Style' => 'tree'
   'Tree' => ARRAY(0x80cec3c)
      0  'tstmt'
      1  ARRAY(0x8128690)
         0  HASH(0x8128678)
              empty hash
         1  'ttitle'
         2  ARRAY(0x81286f0)
            0  HASH(0x81286d8)
                 empty hash
            1  0
            2  'The Old Testament'
         3  0
         4  '
'
         5  'fm'
         6  ARRAY(0x81287c8)
            0  HASH(0x81287b0)
                 empty hash
            1  0
            2  '
'
            3  'p'
            4  ARRAY(0x811a11c)
               0  HASH(0x811a104)
                    empty hash
               1  0
               2  'Source of original ASCII files unknown.'
            5  0
            6  '
'
            7  'p'
            8  ARRAY(0x811a1f4)
               0  HASH(0x811a1dc)
                    empty hash
               1  0
               2  'SGML markup by Jon Bosak, 1992-1994.'
            9  0
            10  '
'
            11  'p'
            12  ARRAY(0x811a2cc)
               0  HASH(0x811a2b4)
                    empty hash
               1  0
               2  'XML version by Jon Bosak, 1996-1998.'
            13  0
            14  '
'
            15  'p'
            16  ARRAY(0x811a3a4)
               0  HASH(0x811a38c)
                    empty hash
               1  0
               2  'This work may be freely distributed internationally.'
            17  0
            18  '
'
         7  0
         8  '
'
         9  'book'
         10  ARRAY(0x812332c)
            0  HASH(0x811a4c4)
               'title' => 'Genesis'
            1  0
            2  '
'
            3  'bktlong'
            4  ARRAY(0x81233bc)
               0  HASH(0x81233a4)
                    empty hash
               1  0
               2  'The First Book of Moses, Called GENESIS.'
            5  0
            6  '
'
            7  'chapter'
            8  ARRAY(0x81234b8)
               0  HASH(0x8123494)
                  'n' => 1
               1  0
               2  '
'
               3  'v'
               4  ARRAY(0x812356c)
                  0  HASH(0x8123548)
                     'n' => 1
                  1  'p'
                  2  ARRAY(0x81235cc)
                     0  HASH(0x81235b4)
                          empty hash
                     1  0
                     2  'In the beginning God created the heaven and the earth.'
                     3  0
                     4  '
'
               5  0
               6  '
'
            9  0
            10  '
'
         11  0
         12  '
'
   'Userdata' => 135369360]]></eg>
</slide>
<slide><title>The XML::Parser "stream" Style</title>
<point>Calls <tt>sub StartTag</tt> for each start-tag,
<tt>sub EndTag</tt> for each end-tag</point>
<point>Calls <tt>sub Text</tt> for text</point>
<point><tt>$_</tt> is the text that was recognized</point>
<point>For <tt>StartTag</tt> and <tt>EndTag</tt>, <tt>$_[0]</tt> is the
element type</point>
<point>For <tt>StartTag</tt>, <tt>%_</tt> is a hash of attribute values by
name</point>
<point>Default action for all callbacks is <tt>print;</tt></point>
</slide>
<slide><title>Some Larger Test Data</title>
<eg size="33" xml:space="preserve"><![CDATA[<?xml version="1.0"?>
<!DOCTYPE tstmt SYSTEM "tstmt.dtd">

<tstmt><ttitle>The New Testament</ttitle>
<fm>
<p>Source of original ASCII files unknown.</p>
<p>SGML markup by Jon Bosak, 1992-1994.</p>
<p>XML version by Jon Bosak, 1996-1998.</p>
<p>This work may be freely distributed internationally.
</p></fm>
<book>
<bktlong>The Gospel According to SAINT MATTHEW.</bktlong>
<bktshort>Matthew</bktshort>
<chapter><chtitle>Chapter 1</chtitle>
<v><vn>1</vn><p>The book of the generation of Jesus 
Christ, the son of David, the son of Abraham.
</p></v>]]></eg>
<point>... and so on: 1,170,010 bytes in total</point>
<point>Note that verse-numbers are elements, not attributes</point>
</slide>
<slide><title>The "Stream" Style: Assignment 1</title>
<intro>Turn the verse numbers into attributes</intro>
<eg size="33" xml:space="preserve"><![CDATA[use XML::Parser;
my $p = new XML::Parser Style => 'stream';
parsefile $p $ARGV[0];

sub StartTag {
    if ($_[0] eq "vn") { print "<v n='"; }
    elsif ($_[0] ne "v") { print; }
}

sub EndTag {
    if ($_[0] eq "vn") { print "'>"; }
    else { print; }
}]]></eg>
</slide>
<slide><title>The "Stream" Style: Assignment 1</title>
<intro>But There's More Than One Way To Do It!</intro>
<eg size="33" xml:space="preserve"><![CDATA[use XML::Parser;
my $p = new XML::Parser Style => 'stream';
parsefile $p $ARGV[0];

sub StartTag {
    if (/<vn/) { print "<v n='"; }
    elsif (!/<v>/) { print; }
}

sub EndTag {
    if (/<\/vn/) { print "'>"; }
    else { print; }
}]]></eg>
</slide>
<slide><title>The "Stream" Style: Assignment 2</title>
<intro>It seems that there's one paragraph per verse (stupid); is this true?</intro>
<eg size="33" xml:space="preserve"><![CDATA[use XML::Parser;
my $p = new XML::Parser Style => 'stream';
parsefile $p $ARGV[0];

sub StartTag {
    if ($_[0] eq "v") { $pCount = 0; }
    elsif ($_[0] eq "p" && grep(/^v$/, @{$p->{Context}})) { $pCount++ };
}
sub EndTag {
    if ($pCount > $maxPCount) { $maxPCount = $pCount; }
}
sub Text { }
sub EndDocument
{
    print "Max P's per V: $maxPCount\n";
}]]></eg>
</slide>
<slide><title>The "Stream" Style: Assignment 3</title>
<intro>OK, lose the superfluous paragraph tags</intro>
<eg size="33" xml:space="preserve"><![CDATA[use XML::Parser;
my $p = new XML::Parser Style => 'stream';
parsefile $p $ARGV[0];

sub StartTag {
    unless ($_[0] eq "p" && grep(/^v$/, @{$p->{Context}})) { print };
}
sub EndTag {
    unless ($_[0] eq "p" && grep(/^v$/, @{$p->{Context}})) { print };
}]]></eg>
</slide>
<slide><title>The "Stream" Style: Assignment 3</title>
<intro>There's More Than One Way To Do It!</intro>
<eg size="33" xml:space="preserve"><![CDATA[use XML::Parser;
my $p = new XML::Parser Style => 'stream';
parsefile $p $ARGV[0];

sub StartTag {
    unless (/<p>/ && $p->{Context}[-1] eq "v") { print };
}
sub EndTag {
    unless (/<.p>/ && $p->{Context}[-1] eq "v") { print };
}]]></eg>
</slide>
<slide><title>The "Stream" Style: Assignment 4</title>
<intro>Also, lose the useless trailing newline</intro>
<eg size="33" xml:space="preserve"><![CDATA[use XML::Parser;
my $p = new XML::Parser Style => 'stream';
parsefile $p $ARGV[0];

sub Text
{
    chop if ($p->{Context}[-1] eq "v");
    print;
}]]></eg>
</slide>
<slide><title>The "Stream" Style: Assignment 5</title>
<intro>Find Jesus!</intro>
<eg size="33" xml:space="preserve"><![CDATA[use XML::Parser;
my $p = new XML::Parser Style => 'stream';
parsefile $p $ARGV[0];

sub Text     { $J++ if (/Jesus/ && $p->{Context}[-1] eq "v") }
sub StartTag { $V++ if ($_[0] eq "v"); }
sub EndTag   { } # default is to print, remember
sub EndDocument { print "$J of $V verses mention Jesus\n"; }]]></eg>
</slide>
<slide><title>The "Stream" Style: Final Assignment</title>
<intro>Build a glossary of terms in the XML specification</intro>
<eg size="33" xml:space="preserve"><![CDATA[<termdef
id="dt-app" term="Application">It is assumed that an XML 
processor is doing its work on behalf of another module, 
called the <term>application</term>.</termdef>]]></eg>
</slide>
<slide><title>The "Stream" Style: Final Assignment</title>
<intro>Build a glossary of terms in the XML specification</intro>
<eg size="33" xml:space="preserve"><![CDATA[use XML::Parser;
my $p = new XML::Parser Style => 'stream';
parsefile $p $ARGV[0];

sub StartDocument {
    print "<html><head><title>List of Terms</title></head>\n";
    print "<body><h2>List of Terms</h2>\n<dl>\n";
}

sub StartTag {
    if (/<termdef/) { print "<dt>$_{term}</dt><dd>"; }
    elsif (/<term>/) { print "<b>"; }
    # some termdefs include grammar productions, sigh
    elsif (/<prod/ && 
           grep(/^termdef$/, @{$p->{Context}})) 
   { print "<br><tt>"; }
}

sub EndTag {
    if (/<.termdef/) { print "</dd>\n"; }
    elsif (/<.term>/) { print "</b>"; }
    elsif (/<.prod/ && 
           grep(/^termdef$/, @{$p->{Context}})) 
    { print "</tt>"; }
    elsif (/<.lhs/ && 
           grep(/^termdef$/, @{$p->{Context}})) 
    { print " ::= "; }
}

sub Text {
    if (grep(/^termdef$/, @{$p->{Context}})
        && !grep(/^head$/, @{$p->{Context}}))
    {
        s/&/&amp;/g;
        s/</&lt;/g;
        print;
    }
}
sub EndDocument { print "</dt></body></html>\n"; }]]></eg>
</slide>
<slide><title>And A Parting Question</title>
<eg size="75">
print if (/perl is terrific/i);

</eg>
<caption>What matches this?</caption>
</slide>
<slide><title>Match?</title>
<eg size="50"><![CDATA[
perl<!-- capitalize? --> is terrific]]>

</eg>
</slide>
<slide><title>Match?</title>
<eg size="40"><![CDATA[
Perl is<?AUDIO heavenly chorus ?> terrific

]]></eg>
</slide>
<slide><title>Match?</title>
<eg size="40"><![CDATA[
Perl<fnote isbn="1-56592-149-6" /> is terrific

]]></eg>
</slide>
<slide><title>Match?</title>
<eg size="40"><![CDATA[
Perl is <emph>not so</emph> terrific, said 

]]></eg>
</slide>
<slide><title>Match?</title>
<eg size="40"><![CDATA[
Perl <quot>is terrific</quot>, commented Bray

]]></eg>
</slide>
<slide><title>Match?</title>
<eg size="40"><![CDATA[<!ENTITY adjective "terrific">
...
perl is &adjective;!]]></eg>
</slide>
</show>

