Text only | Skip links
Skip links||IT Services, University of Oxford

1. Topics to cover

  1. Refreshing our knowledge of XSLT and XPath
  2. Some less obvious bits of XSLT
  3. ODD for profit
  4. FAQ: your problems here

2. Refreshing our knowledge of XSLT and XPath

  • XPath as the universal glue
  • XSLT processing model
  • Differences between XSLT 1.0 and 2.0
  • HTML: remembering what to leave to CSS and Javascript
  • XSLT writing XSLT

2.1. Ways of using XSLT

We'll bear in mind four ways in which XSLT may be used within your project:
  1. to create web pages, dynamically or statically, showing your content
  2. to convert your TEI XML into other formats (examples: LaTeX for typesetting, OOXML for using in Word, ePub for ebooks, RSS for syndication, RDF for open data exchange)
  3. to report on (‘how many <msDesc> are there which have 2 authors?’), and validate (‘report all <author> elements which have only text content, not structured forename and surname’ your TEI XML
  4. to change your TEI XML (‘remove punctuation between elements in <biblStruct>’, ‘replace all occurrences of Driscoll, M with <forename>Driscoll</forename><surname>Matthew</surname>’)

2.2. But first, XPath

XPath is the basis of most other XML querying and transformation languages.

  • It is a syntax for accessing parts of an XML document
  • It uses a path structure to define XML elements
  • It has a library of standard functions
  • It is a W3C Standard and one of the main components of XQuery and XSLT

2.3. Example text

<body n="anthology">
 <div type="poem">
  <head>The SICK ROSE </head>
  <lg type="stanza">
   <l n="1">O Rose thou art sick.</l>
   <l n="2">The invisible worm,</l>
   <l n="3">That flies in the night </l>
   <l n="4">In the howling storm:</l>
  </lg>
  <lg type="stanza">
   <l n="5">Has found out thy bed </l>
   <l n="6">Of crimson joy:</l>
   <l n="7">And his dark secret love </l>
   <l n="8">Does thy life destroy.</l>
  </lg>
 </div>
</body>

2.4.

2.5.

2.6.

2.7.

2.8.

2.9.

2.10.

2.11.

2.12.

2.13.

2.14.

2.15.

2.16.

2.17.

2.18.

2.19.

2.20.

2.21.

2.22.

2.23.

2.24.

2.25.

2.26.

2.27.

2.28.

2.29.

2.30. XPath: More About Paths

  • A location path results in a node-set
  • Paths can be absolute (/div/lg[1]/l)
  • Paths can be relative (l/../../head)
  • Formal Syntax: (axisname::nodetest[predicate])
  • For example:child::div[contains(head, 'ROSE')]

2.31. XPath: Axes

ancestor::
Contains all ancestors (parent, grandparent, etc.) of the current node
ancestor-or-self::
Contains the current node plus all its ancestors (parent, grandparent, etc.)
attribute::
Contains all attributes of the current node
child::
Contains all children of the current node
descendant::
Contains all descendants (children, grandchildren, etc.) of the current node
descendant-or-self::
Contains the current node plus all its descendants (children, grandchildren, etc.)

2.32. XPath: Axes (2)

following::
Contains everything in the document after the closing tag of the current node
following-sibling::
Contains all siblings after the current node
parent::
Contains the parent of the current node
preceding::
Contains everything in the document that is before the starting tag of the current node
preceding-sibling::
Contains all siblings before the current node
self::
Contains the current node

2.33. Axis examples

  • ancestor::lg = all <lg> ancestors
  • ancestor-or-self::div = all <div> ancestors or current
  • attribute::n = n attribute of current node
  • child::l = <l> elements directly under current node
  • descendant::l = <l> elements anywhere under current node
  • descendant-or-self::div = all <div> children or current
  • following-sibling::l[1] = next <l> element at this level
  • preceding-sibling::l[1] = previous <l> element at this level
  • self::head = current <head> element

2.34. XPath: Predicates

  • child::lg[attribute::type='stanza']
  • child::l[@n='4']
  • child::div[position()=3]
  • child::div[4]
  • child::l[last()]
  • child::lg[last()-1]

2.35. XPath: Abbreviated Syntax

  • ‘nothing’ is the same as child::, so lg is short for child::lg
  • @ is the same as attribute::, so @type is short for attribute::type
  • . is the same as self::, so ./head is short for self::node()/child::head
  • .. is the same as parent::, so ../lg is short for parent::node()/child::lg
  • // is the same as descendant-or-self::, so div//l is short for child::div/descendant-or-self::node()/child::l

2.36. Simple complete XSLT

<xsl:stylesheet version="2.0"
  xpath-default-namespace="http://www.tei-c.org/ns/1.0"   xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

 <xsl:template match="TEI">
  <html>
   <head>
    <title>My document</title>
   </head>
   <body>
    <xsl:apply-templates select="text"/>
   </body>
  </html>
 </xsl:template>
 <xsl:template match="div">
  <h1>
   <xsl:sequence select="head"/>
  </h1>
  <xsl:apply-templates select="*[not(self::head)]"/>
 </xsl:template>
</xsl:stylesheet>

2.37. Example of context-dependent matches

Compare
<xsl:template match="head"> .... </xsl:template>
with
<xsl:template match="div/head"> ... </xsl:template>
<xsl:template match="figure/head"> .... </xsl:template>

2.38. Priorities when templates conflict

It is possible for it to be ambiguous which template is to be used:
<xsl:template match="person/name">
</xsl:template>
<xsl:template match="name"></xsl:template>
when the processor meets a <name>, which template is used?

2.39. Solving priorities

There is a priority attribute on <template>; the higher the value, the more inclined the XSLT engine is to use it:
<xsl:template match="namepriority="1">
 <xsl:apply-templates/>
</xsl:template>
<xsl:template match="person/namepriority="2"> A name </xsl:template>

2.40. Template priority generally

The more normal rule is that the most specific template wins.
<xsl:template match="*">
<!-- ... -->
</xsl:template>
<xsl:template match="tei:*">
<!-- ... -->
</xsl:template>
<xsl:template match="p">
<!-- ... -->
</xsl:template>
<xsl:template match="div/p">
<!-- ... -->
</xsl:template>
<xsl:template match="div/p/@n">
<!-- ... -->
</xsl:template>

2.41. Pushing and pulling

XSLT stylesheets can be characterized as being of two types:
push
In this type of stylesheet, there is a different template for every element, communication via <xsl:apply-templates> and the overall result is assembled from bits in each template. It is sometimes hard to visualize the final design. Common for data-oriented processing where the structure is fixed.
pull
In this type, there is a master template (usually matching /) with the main structure of the output, and specific <xsl:for-each> or <xsl:sequence> commands to grab what is needed for each part. The templates tend to get large and unwieldy. Common for document-oriented processing where the input document structure varies.

2.42. Feature: for-each

If we want to avoid lots of templates, we can do in-line looping over a set of elements. For example:
<xsl:template match="listPerson">
 <ul>
  <xsl:for-each select="person">
   <li>
    <xsl:sequence select="persName"/>
   </li>
  </xsl:for-each>
 </ul>
</xsl:template>
contrast with
<xsl:template match="listPerson">
 <ul>
  <xsl:apply-templates select="person"/>
 </ul>
</xsl:template>
<xsl:template match="person">
 <li>
  <xsl:sequence select="persName"/>
 </li>
</xsl:template>

2.43. Attribute value template

What if we want to turn
<ref target="http://www.oucs.ox.ac.uk/">OUCS</ref>
into
<xhtml:a href="http://www.oucs.ox.ac.uk/"/>
? What we cannot do is
<xsl:template match="ref">
 <a href="@target">
  <xsl:apply-templates/>
 </a>
</xsl:template>

This will give the href attribute the value ‘@target’.

2.44. For example

Instead we use {} to indicate that the expression must be evaluated:
<xsl:template match="ref">
 <a href="{@target}">
  <xsl:apply-templates/>
 </a>
</xsl:template>

This will give the href attribute whatever value the attribute target has.

2.45. From XSLT 1.0 to XSLT 2.0

Points to remember

  • The current version of XSLT is 2.0. Version 3.0 is in preparation
  • Perl, PHP and browser-based implementations almost all use libxslt, which is 1.0 only
  • Any Java-based environment (web servlet, Ant, xproc, oXygen, command-line) can use Saxon, which implements 2.0 fully

2.46. The big deals in XSLT 2.0 (in my view)

Grouping
You can identify and process groups in your source
Output files
You can write as many output files as you want
Creating and processing node sets
You can save temporary trees and re-process them
Functions
You can define your own functions, much less verbose than templates
Default namespace
Save yourself loads of typing
Regular expression processing
very convenient for text handling
XPath 2.0
Lots of new functionality you can use

2.47. XPath extra functionality

You may like to use if ... then in your XPath expression.

To choose the name of an HTML output element based on value of @type on <list>:

<xsl:template match="list">
 <xsl:element
   name="{if (@type='gloss') then 'dl' else if (@type='ordered') then 'ol' else 'ul'}">

  <xsl:apply-templates/>
 </xsl:element>
</xsl:template>

You may also like to use more functions in your XPath:

<xsl:template match="/">
 <xsl:sequence
   select="string-join(distinct-values(doc('index.xml')//gi),'===')"/>

</xsl:template>

2.48. Default namespace

<xsl:stylesheet version="2.0"
  xpath-default-namespace="http://www.tei-c.org/ns/1.0">

 <xsl:template match="p"/>
</xsl:stylesheet>

2.49. Output files

<xsl:template match="msDesc">
 <xsl:result-document href="{@n}.html">
  <xsl:apply-templates/>
 </xsl:result-document>
</xsl:template>

2.50. Temporary node sets, modes

<xsl:template match="TEI">
 <xsl:variable name="pass0">
  <xsl:apply-templates mode="pass0"/>
 </xsl:variable>
 <xsl:apply-templates select="$pass0"/>
</xsl:template>

2.51. HTML, CSS and Javascript

Your final web page consists (probably) of
  • the body of HTML created by running a transformation on your TEI XML
  • standard navigation, search box, header, footer, menu items authored in HTML
  • one or more CSS files controlling look and feel
  • one or more Javascript scripts which do something clever (like providing sortable tables)
So
  • XSLT transforms one document model to another
  • CSS decorates the HTML document
  • Javascript changes the HTML document dynamically

2.52. Ways of using HTML

Contrast:
<h1>6. Introduction to
<b>R</b>
</h1>
with
<style type="text/css"> h1 { counter-increment: div1; } h1:before { content:
counter(div1) ". "; } span.package { font-weight:bold; } </style> ....

<h1>Introduction to <span class="package">R</span>
</h1>

2.53. Creating XSLT with XSLT

Not as hard as it sounds, but it needs a crucial instruction, <xsl:namespace-alias>:
<xsl:stylesheet
  xpath-default-namespace="http://www.tei-c.org/ns/1.0version="2.0">

 <xsl:output method="xmlindent="yes"/>
 <xsl:namespace-alias stylesheet-prefix="xoutresult-prefix="xsl"/>
 <xsl:template match="/">
  <stylesheet xmlns="http://www.w3.org/1999/XSL/TransformAlias"
   version="2.0"
    xpath-default-namespace="http://www.tei-c.org/ns/1.0">

   <template xmlns="http://www.w3.org/1999/XSL/TransformAlias"
    match="/">

    <apply-templates xmlns="http://www.w3.org/1999/XSL/TransformAlias"
    />
</template></stylesheet>
 </xsl:template>
</xsl:stylesheet>

2.54. Generating templates for values of @rend

<xsl:stylesheet
  xpath-default-namespace="http://www.tei-c.org/ns/1.0version="2.0">

 <xsl:output method="xmlindent="yes"/>
 <xsl:namespace-alias stylesheet-prefix="xoutresult-prefix="xsl"/>
 <xsl:template match="/">
  <stylesheet xmlns="http://www.w3.org/1999/XSL/TransformAlias"
   version="2.0"
    xpath-default-namespace="http://www.tei-c.org/ns/1.0">

   <xsl:for-each
     select="distinct-values(//*[@rend]/concat(name(),'+',@rend))">

    <template xmlns="http://www.w3.org/1999/XSL/TransformAlias"
    
      match="{substring-before(.,'+')}[@rend='{substring-after(.,'+')}']">

     <apply-templates xmlns="http://www.w3.org/1999/XSL/TransformAlias"
     />
</template>
   </xsl:for-each></stylesheet>
 </xsl:template>
</xsl:stylesheet>

3. Some less obvious bits of XSLT

  • How to do grouping
  • Working with functions
  • Sequences and the newer XPath functions
  • Stylesheet profiling using Saxon

3.1. Some diary data

<div xml:id="g1798type="dYear">
 <div type="dMonthxml:id="g1798-01">
  <ab type="dDayxml:id="g1798-01-01">
   <date when="1798-01-01">Jan. 1. 1798.
       M.</date>
   <ref type="dTextsubtype="readtarget="/bibl/te0807.html">Burke's
       3<hi rend="sup">rd</hi> Letter, p. 34</ref>: <ref type="dTextsubtype="readtarget="/bibl/te0808.html">Rival Queens, acts 1, 2, 3</ref>. <seg type="dMeetingsubtype="CG">
    <persName ref="/people/FAW01.html">Fawcet</persName> calls</seg>: <seg type="dMealsubtype="SG">
    <persName ref="/people/MAR01.html">M</persName> sups</seg>. <seg type="dMeetingsubtype="M">meet <persName>Barnes</persName>
   </seg>.</ab>
  <ab type="dDayxml:id="g1798-01-02">
   <date when="1798-01-02">2. Tu.</date>
   <ref type="dWrotesubtype="writetarget="/works/leon01.html">O. M., p. 2, 3</ref>.
  <ref type="dTextsubtype="readtarget="/bibl/te0810.html">Burke's Memorials,
       p. 40</ref>. <seg type="dMeetingsubtype="CG">
    <persName ref="/people/COO05.html">Miss Cooper</persName>
   </seg>, <seg type="dMeetingsubtype="CG">
    <persName ref="/people/HOL10.html">mrs Cole</persName>
   </seg>, <seg type="dMeetingsubtype="CG">
    <persName ref="/people/HOL06.html">F
         Ht</persName>
   </seg> &amp; <seg type="dMeetingsubtype="CG">
    <persName ref="/people/FEN01.html">F</persName> call</seg>: <seg type="dMealsubtype="D">dine at <persName ref="/people/JOH01.html">
     <placeName type="venue">Johnson's</placeName>
    </persName>, w. <persName ref="/people/FUS01.html">Fuseli</persName> &amp; <persName>Wilkinson</persName>. </seg>
   <seg type="dMeetingsubtype="See">
    <persName ref="/people/CAR01.html">Carlisle</persName> &amp; <persName ref="/people/COM01.html">Combe</persName>
   </seg>.</ab>
  <ab type="dDayxml:id="g1798-01-03">
   <date when="1798-01-03">3. W.</date>
   <ref type="dTextsubtype="readtarget="/bibl/te0810.html">Memorials, p. 122</ref>.
  <seg type="dMeetingsubtype="CG">
    <persName ref="/people/COM01.html">Combe</persName>
   </seg> &amp; <seg type="dMeetingsubtype="CG">
    <persName ref="/people/WHI03.html">White</persName> call</seg>: <seg type="dMeetingsubtype="C">call on <persName ref="/people/LES02.htmltype="nah">
     <placeName type="venue">Leslie</placeName>
    </persName> n</seg>, <seg type="dMeetingsubtype="C">
    <persName ref="/people/KEA01.htmltype="nah">
     <placeName type="venue">Kearsley</placeName>
    </persName> n</seg>, &amp; <seg type="dMeetingsubtype="C">
    <persName ref="/people/NIC01.htmltype="nah">
     <placeName type="venue">Nicholson</placeName>
    </persName> n</seg>. <ref type="dEntertainmentsubtype="Theattarget="/plays/cast01.html">
    <placeName type="DL"/>Theatre, 3/10 Castle Spectre</ref>.</ab>
 </div>
</div>

3.2. Some reverse engineering (1)

There is a calendar with links to each day. What XSLT commands might have been used to do this?

3.3. Sorting (1)

The element <xsl:for-each> is central to generating indexes such as the calendar from the previous slide.

<xsl:sort> is used to determine the order of each item matched

<ab type="dDayxml:id="g1800-01-01">
 <date when="1800-01-01">1800. Jan. 1. W.</date>
</ab>
<xsl:for-each select="//ab[@type='dDay']">
 <xsl:sort select="date/@whenorder="ascending"/>
 <xsl:sequence select="date/@when"/>: <xsl:sequence select="date"/>
 <br/>
</xsl:for-each>
1797-01-01: Jan. 1. Su. 1797-01-02: 2. M.
1797-01-03: 3. Tu. 1797-01-04: 4. W. 1797-01-05: 5. Th. 1797-01-06: 6. F.

3.4. Sorting (2)

More than one sorting condition can be expressed.

Let's create an index of people at meals. First sort them by date, then alphabetically.

<ab type="dDayxml:id="g1800-01-02">
 <date when="1800-01-01">1800. Jan. 2. W.</date>
 <seg type="dMeal">Sup at <persName>Fell's</persName>.</seg>
</ab>
<xsl:for-each
  select="//persName[ancestor::seg[@type='dMeal']]">

 <xsl:sort
   select="ancestor::ab[@type='dDay']/date/@when"
   order="ascending"/>

 <xsl:sort select="normalize-space(.)"/>
 <xsl:sequence select="."/>
</xsl:for-each>

3.5. Grouping

Indices get more complex when data needs to be grouped.

What if we wanted to organize people by the event in which they are mentioned?

Example encoding:

<seg type="dMeetingsubtype="CG">
 <persName ref="/people/ELW01.html">S Elwes</persName>
</seg>
<seg type="dMealsubtype="D">Dine at <persName ref="/people/SMI02.html">
  <placeName type="venue">C Smith's</placeName>
 </persName>, w. <persName ref="/people/FEN01|FEN03.html">Fenwicks</persName>
</seg>

3.6. xsl:for-each-group

Main components:

  • select: what needs to be grouped (the population)
  • a grouping pattern that must be matched. Expressed with group-by, group-adjacent, group-starting-with or group-ending-with.

3.7. Simple grouping (1)

Select all persName elements; group them by the seg/@type that contains them.

<xsl:for-each-group select="//persNamegroup-by="ancestor::seg[1]/@type">
<!-- ... -->
</xsl:for-each-group>

3.8. xsl:for-each-group functions

The <xsl:for-each-group> element allows to use certain functions in XPath:

  • current-grouping-key() refers to the result of the XPath in the grouping condition
  • current-group() refers to the group selected at the current iteration

N.B. current-group() returns item()* which means that you can iterate on it.

3.9. Simple grouping (2)

Select all <persName> elements; group them by the type of the <seg> that contains them.

<xsl:for-each-group select="//persNamegroup-by="ancestor::seg[1]/@type">
<!-- Let's write out the type of event --> Event: <xsl:value-of select="current-grouping-key()"/>
<!-- Now let's print out each persName in the group. Remember that current-group() is iterable -->
 <xsl:for-each select="current-group()">
  <xsl:value-of select="."/>
 </xsl:for-each>
</xsl:for-each-group>

3.10. Grouping for format conversion and output

Grouping can be helpful when converting to other formats.

Example: dealing with TEI milestones in HTML.

3.11. Splitting elements around milestones (1)

TEI input

<text>
 <body>
  <p>
   <del rend="overstrike">Card room where <lb/> nine out of ten had no
       inclination</del>
  </p>
 </body>
</text>

TEI output

<del rend="overstrike">Card room where </del>
<lb/>
<del rend="overstrike"> nine out of
ten had no inclination</del>

3.12. Splitting elements around milestones

XSLT

<xsl:template match="del">
 <xsl:choose>
  <xsl:when test="lb">
   <xsl:for-each-group select="node()group-starting-with="lb">
<!-- Copies the only lb in the group first (N.B the first group does not contain lb) -->
    <xsl:sequence select="current-group()/self::lb"/>
    <del>
     <xsl:sequence select="current()/ancestor::del/@*"/>
<!-- Copies the elements in the group except lb -->
     <xsl:sequence select="current-group()[not(self::lb)]"/>
    </del>
   </xsl:for-each-group>
  </xsl:when>
  <xsl:otherwise>
   <xsl:sequence select="."/>
  </xsl:otherwise>
 </xsl:choose>
</xsl:template>

3.13. A closer look at group-starting-with

Each group should start with lb, but members of the population before <lb> are also grouped.

First iteration's current-group():
  • ‘Card room where’
Second iteration's current-group():
  • <lb>
  • ‘nine out of ten had no inclination’

3.14. ‘group-adjacent’ for creating structure

Some badly formed input

<p>we saw three sorts of animals: <item>cats</item>
 <item>dogs</item>
 <item>lions</item>
and many others besides</p>
Desired output:
<p>we saw three sorts of animals: <list>
  <item>cats</item>
  <item>dogs</item>
  <item>lions</item>
 </list> and many others besides </p>

3.15. The XSL to do that

<xsl:template match="p">
 <xsl:copy>
  <xsl:for-each-group
    select="node()"
    group-adjacent="if (self::item) then 1 else 2">

   <xsl:choose>
    <xsl:when test="current-grouping-key()=1">
     <list>
      <xsl:copy-of select="current-group()"/>
     </list>
    </xsl:when>
    <xsl:otherwise>
     <xsl:copy-of select="current-group()"/>
    </xsl:otherwise>
   </xsl:choose>
  </xsl:for-each-group>
 </xsl:copy>
</xsl:template>

3.16. XPath: some string functions

concat concat('See ',ref) See Jones 72
string-join string-join(//foreign,'+') doigts + bientot
substring substring(head,2,4) orni
string-length string-length(country) 7
upper-case upper-case('ab') AB
lower-case lower-case('AB') ab
translate translate('A cat','Aa','11') 1 c1t
contains contains('dog','o') true
starts-with starts-with('Denmark','A') false
ends-with ends-with('Iceland','d') true
matches matches(date,'[0-9]+') true
replace replace(date/@when,'[: ,-.]','') 19551302
tokenize tokenize(date/@when,'-') 1797 12 31
distinct-values distinct-values(//name) Matthew Mark Luke John

3.17. Your own functions (1)

XPath 2.0 within XSLT 2.0 is quite powerful, however, there are things it can't do:
  • declare variables
  • grouping (without workarounds)
  • declare functions
  • ...

The <xsl:function> element can be used to define XSLT code to be called within XPath.

3.18. Your own functions (2)

Uses:
  • common XPath code to be re-used (for replacing, tokenizing, etc.)
  • determine complex conditions for sorting and grouping
  • assist a complex select on <xsl:template> or <xsl:apply-templates>
  • whenever <xsl:template> is not enough
  • ...

3.19. Function example

Ignore accents when sorting.

<xsl:function name="tei:stripAccents">
 <xsl:param name="s"/>
 <xsl:variable name="accents">àáâèéêòóôìíî</xsl:variable>
 <xsl:variable name="noAccents">aaaeeeoooiii</xsl:variable>
 <xsl:sequence
   select="translate(lower-case($s), $accents, $noAccents)"/>

</xsl:function>
<xsl:template match="/">
 <xsl:for-each select="//persName">
  <xsl:sort
    select="tei:stripAccents(normalize-space(.))"/>

  <name>
   <xsl:sequence select="."/>
  </name>
 </xsl:for-each>
</xsl:template>

All functions must be in some namespace.

3.20. Reading many files at a time

Support we want to use this technique, but across a whole directory of files? The Saxon processor has a small extension to allow you to get a list of files, and then access them with the collection() function:

<xsl:template name="main">
 <xsl:variable name="pathlist">
  <xsl:text>.?select=*.xml;recurse=yes;on-error=warning</xsl:text>
 </xsl:variable>
 <xsl:variable name="docsselect="collection($pathlist)"/>
 <xsl:for-each select="$docs">
  <xsl:message>
   <xsl:sequence select="base-uri()"/>
  </xsl:message>
 </xsl:for-each>
</xsl:template>

Note that here there is just a named template. You need to tell the processor to start with that.

3.21. How do you specify an initial template in oXygen?

Select the "advanced options" icon (on right-hand side of Transformer: box).

3.22. How do you specify an initial template in oXygen? (2)

3.23. Sequences in XSLT 2.0

A sequence is a set of atomic values which can be iterated over or operated upon using a function. You can use a normal XPath expression:
<xsl:sequence select="//idno"/>
or an implicit set using ():
<xsl:sequence select="(1,2,3,'hello')"/>
Unlike most other instructions, <xsl:sequence> can return a sequence containing existing nodes, rather than constructing new ones.
Functions operate on sequences:
<xsl:value-of select="sum((1,2,45))"/>
and so do iterators:
<xsl:for-each select="(1 to 10)">
 <xsl:message>hello!</xsl:message>
</xsl:for-each>

3.24. XSLT for reporting

It is easy to write some XSL for summarizing your text:

<xsl:template match="/"   xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:message>Values for IDNO: <xsl:sequence
    select="string-join(distinct-values(//idno),' ')"/>

 </xsl:message>
 <xsl:apply-templates select="*"/>
</xsl:template>
You could
  • Use <xsl:message> to just produce messages on the screen
  • Use <xsl:output method="text"/> to produce a text file
  • Create a new TEI XML document as output, and make PDF or HTML from that

3.25. Manipulating your TEI

A basic technique to perform an (almost) identity transform:

<xsl:template
  match="@*|text()|comment()|processing-instruction()">

 <xsl:copy-of select="."/>
</xsl:template>
<xsl:template match="*">
 <xsl:copy>
  <xsl:apply-templates
    select="*|@*|processing-instruction()|comment()|text()"/>

 </xsl:copy>
</xsl:template>
<xsl:template match="p[not(@type)]">
 <ab xmlns="http://www.tei-c.org/ns/1.0"
 >

 <xsl:apply-templates
   select="*|@*|processing-instruction()|comment()|text()"/>
</ab>
</xsl:template>
<xsl:template match="pb"/>

3.26. Manipulating your TEI (2)

  • ‘remove punctuation between elements in <biblStruct>
    <xsl:template match="biblStruct">
     <xsl:copy>
      <xsl:apply-templates select="*"/>
     </xsl:copy>
    </xsl:template>
  • ‘replace all occurrences of Driscoll, M with M Driscoll
    <xsl:template match="text()">
     <xsl:sequence
       select="replace(.,'Driscoll, M', 'M Driscoll')"/>

    </xsl:template>

3.27. Manipulating your TEI (3)

A bit harder? ‘replace all occurrences of Driscoll, M with <forename>Driscoll</forename><surname>Matthew</surname>

<xsl:template match="text()">
 <xsl:choose>
  <xsl:when test="contains(.,'Driscoll, M')">
   <xsl:value-of select="substring-before(.,'Driscoll, M')"/>
   <forename>Driscoll</forename>
   <surname>Matthew</surname>
   <xsl:value-of select="substring-after(.,'Driscoll, M')"/>
  </xsl:when>
  <xsl:otherwise>
   <xsl:copy-of select="."/>
  </xsl:otherwise>
 </xsl:choose>
</xsl:template>

3.28. Manipulating your TEI (4)

We can be more elegant than that:

<xsl:template match="text()">
 <xsl:analyze-string select=".regex="(Driscoll),\s*(M[a-z]*)\.?">
  <xsl:matching-substring>
   <forename>
    <xsl:sequence select="regex-group(1)"/>
   </forename>
   <surname>
    <xsl:sequence select="regex-group(2)"/>
   </surname>
  </xsl:matching-substring>
  <xsl:non-matching-substring>
   <xsl:copy-of select="."/>
  </xsl:non-matching-substring>
 </xsl:analyze-string>
</xsl:template>

3.29. Using the regex version

Input

<p>Driscoll, M...</p>
<p>Driscoll, M, Driscoll,M</p>
<p>I met Driscoll, Matthew one morning</p>

Output

<p>
 <forename>Driscoll</forename>
 <surname>M.</surname>..
</p>
<p>
 <forename>Driscoll</forename>
 <surname>M</surname>, <forename>Driscoll</forename>
 <surname>M</surname>
</p>
<p>I met <forename>Driscoll</forename>
 <surname>Matthew</surname> one morning </p>

3.30. The structure of analyze-string

<xsl:analyze-string select="$textregex="$pattern">
 <xsl:matching-string>
<!-- what to do if pattern matches -->
<!-- you can refer to regex-group(0) for the whole string -->
<!-- or eg regex-group(1) and regex-group(2) for bits -->
 </xsl:matching-string>
 <xsl:non-matching-string>
<!-- what to do if pattern does not match -->
 </xsl:non-matching-string>
</xsl:analyze-string>
Notes:
  • it can do multiple matches unless you put anchors in the pattern (^ and $ for start and end)
  • inside the analyze-string, you are no longer in the document context — don't try eg test="parent::p"
  • \s = whitespace, \w = word characters, \d = digits
See (eg) http://www.regular-expressions.info/reference.html

3.31. That thing about context, its important

Consider this:
<xsl:for-each select="distinct-values(//*/name())">
 <xsl:variable name="nameselect="."/>
 <xsl:message>
  <xsl:value-of select="count(//*[name()=$name])"/>
 </xsl:message>
</xsl:for-each>
why does that not work?

3.32. Context, how to solve it

Store the current context in a variable:

<xsl:variable name="origselect="/"/>
<xsl:for-each select="distinct-values(//*/name())">
 <xsl:variable name="nameselect="."/>
 <xsl:message>
  <xsl:value-of select="count($orig//*[name()=$name])"/>
 </xsl:message>
</xsl:for-each>

Note that you cannot get from one of the values there to its context.

3.33. Profiling your stylesheet

Sometimes, you really want to know where your stylesheet is spending its time. Using the Saxon processor, you can get a profile. On a command line:
java -jar /usr/share/saxon9he.jar -TP -o:test.html \ -s:test.xml -xsl:/usr/share/xml/tei/stylesheet/xhtml2/tei.xsl \ >& profile.html

(this assumes you know where the Saxon .jar file is, where the stylesheet is, and you know how to redirect stderr to a file)

3.34. Profiling your stylesheet: result

4. ODD for profit

  • Overview of the ODD language
  • Roma, OxGarage and oXygen: processing your ODD
  • Using ODD and Schematron to get tight constraints

4.1. Why might you need ODD?

  • You need to define an XML schema to describe your resource
  • You need to provide documentation about
    • the semantics of your XML schema
    • constraints, usage notes, examples
  • You need to keep the two in step
  • You want to share the results
    • with others
    • with yourself, long term
  • you don't want to reinvent the wheel

4.2. The basic idea (1)

A special XML vocabulary for defining....
  • schemas
  • XML element types independent of a schema
  • public or private groups of such elements
  • patterns (MLE macros)
  • classes (and subclasses) of element
And also for defining references which can pull into a schema
  • named components from the above list
  • objects from other namespaces

All embedded within conventional document markup elements

4.3. The basic idea (2)

An ODD processor:

  • assembles all the components referenced or directly provided
  • resolves multiple declarations
  • may do some validity checking
  • emits a schema in one or more formal languages
  • emits a "plain" XML document with selected documentary components

http://www.tei-c.org/Roma/

http://www.tei-c.org/Byzantium/

http://oxgarage.oucs.ox.ac.uk:8080/ege-webclient/

4.4. ODD conversion in oXygen

4.5. ODD conversion in Oxgarage

4.6. A simple example

We have <stuff>, which contains a mixture of <bit>s and <bob>s. We have never heard of the TEI and we don't want to use it. Likewise namespaces.

<schemaSpec ns="start="stuffident="simpleS">
 <elementSpec ident="stuff">
  <desc>Root element for a very simple schema</desc>
  <content>
   <rng:oneOrMore>
    <rng:choice>
     <rng:ref name="bit"/>
     <rng:ref name="bob"/>
    </rng:choice>
   </rng:oneOrMore>
  </content>
 </elementSpec>
<!-- ... continues on next slide -->
</schemaSpec>

4.7. A simple example, contd.


<!-- ... contd --><elementSpec ident="bob">
 <desc>Empty pointing element in a very simple schema</desc>
 <content>
  <rng:empty/>
 </content>
 <attList>
  <attDef ident="href">
   <desc>supplies the URI of the thing pointed at</desc>
   <datatype>
    <rng:data type="anyURI"/>
   </datatype>
  </attDef>
 </attList>
</elementSpec>
<elementSpec ident="bit">
 <desc>textual element in a very simple schema (may have bobs in it)</desc>
 <content>
  <rng:zeroOrMore>
   <rng:choice>
    <rng:text/>
    <rng:ref name="bob"/>
   </rng:choice>
  </rng:zeroOrMore>
 </content>
</elementSpec>

4.8. So what?

  • We can now build a schema in RELAXNG, W3C schema, or DTD language by a simple XSLT transformation
  • We can also extract documentary fragments (e.g. the descriptions of elements and attributes)
TEI provides a special element for the latter purpose:
<specList>
 <specDesc key="bit"/>
 <specDesc key="bob"/>
</specList>
which would generate something like
<bit>
textual element in a very simple schema (may have bobs in it)
<bob>
Empty pointing element in a very simple schema
inside our running text

4.9. What else might you want to say about your elements?

  • alternative <desc>s or <gloss>es in different languages maybe?
  • some reference usage examples
  • Schematron constraints
  • value lists
  • class memberships

4.10. Alternative descriptions

<elementSpec xmlns="http://www.tei-c.org/ns/1.0"
 module="coreident="p">

<gloss xmlns="http://www.tei-c.org/ns/1.0"
>
paragraph</gloss>
<gloss xmlns="http://www.tei-c.org/ns/1.0"
 version="2007-05-02xml:lang="zh-tw">
段落</gloss>
<desc xmlns="http://www.tei-c.org/ns/1.0"
>
marks paragraphs in prose.</desc>
<desc xmlns="http://www.tei-c.org/ns/1.0"
 version="2007-05-02xml:lang="zh-tw">
標記散文的段落。</desc>
<desc xmlns="http://www.tei-c.org/ns/1.0"
 version="2008-04-05xml:lang="ja">
散文の段落を示す. </desc>
<desc xmlns="http://www.tei-c.org/ns/1.0"
 version="2009-01-06xml:lang="fr">
marque les paragraphes dans un texte en
prose.</desc>
<desc xmlns="http://www.tei-c.org/ns/1.0"
 version="2007-05-04xml:lang="es">
marca párrafos en prosa.</desc>
<desc xmlns="http://www.tei-c.org/ns/1.0"
 version="2007-01-21xml:lang="it">
indica i paragrafi in prosa</desc>
<!-- ... --></elementSpec>

4.11. Usage examples

The <exemplum> element combines an XML example with some discussion of it...
<exemplum xml:lang="en"> <egXML xmlns="http://www.tei-c.org/ns/Examples"> <langUsage> <language ident="en">English</language> </langUsage></egXML> <p>In the source of the TEI Guidelines, this element declares itself and its content as belonging to the namespace <ident type="ns">http://www.tei-c.org/ns/Examples</ident>. This enables the content of the element to be validated independently against the TEI scheme. </p></exemplum> </eg>

4.12. Defining the content of an element

  • We use RELAXNG directly to define content for elements and attributes (rather than re-invent an equally expressive language)
  • Generated patterns are uniquified by means of an automatic prefix, which can be switched on or off
  • Content can be constrained by means of a <valList> element ...
  • ... or by means of a <datatype> element (which also uses RELAXNG)
  • Generic constraints can be expressed by means of <constraint> elements (which uses e.g. ISO Schematron)

4.13. About this wheel of yours...

The TEI does actually define elements very like yours. Why not just use them?
<schemaSpec
  source="/usr/share/xml/tei/odd/Source/Guidelines/en/guidelines-en.xml"
  start="div"
  ident="simpleS-2">

 <elementRef key="div"/>
 <elementRef key="p"/>
 <elementRef key="ptr"/>
</schemaSpec>

The source attribute is a URI of any kind, from which specifications are available. It could be a file name, a URL, a DOI...

4.14. Why use the TEI definitions?

  • Principle of least effort
  • Your resources now have a standard semantics attached to them
  • (And you can explain how you've interpreted them in your own documentation)
And (if you like) you can mix and match:
<schemaSpec
  source="/usr/share/xml/tei/odd/Source/Guidelines/en/guidelines-en.xml"
  start="stuff"
  ident="simpleS-3">

 <elementSpec ns="ident="stuff">
  <desc>Root element for a very
     simple
     schema</desc>
  <content>
<!-- as before -->
  </content>
 </elementSpec>
 <elementRef key="p"/>
 <elementRef key="ptr"/>
</schemaSpec>

4.15. In the real world, elements come in packs

A module is a named collection of elements. The TEI provides 22 such. To include one of them in a schema, use the <moduleRef> element:
<schemaSpec start="TEIident="testSchema-4">
 <moduleRef key="core"/>
 <moduleRef key="header"/>
 <moduleRef key="textstructure"/>
</schemaSpec>

Every TEI element belongs to a single module and has a unique name.

4.16. Recap

  • The TEI encoding scheme consists of a number of modules
  • Each module contains a number of element specifications
  • Each element specification contains:
    • a canonical name (<gi>) for the element, and optionally other names in other languages
    • a canonical description (also possibly translated) of its function
    • a declaration of the classes to which it belongs
    • a definition for each of its attributes
    • a definition of its content model
    • usage examples and notes
  • a TEI schema specification (<schemaSpec>) can contain
    • references to modules or elements
    • (re)declarations for elements, classes, or macros
  • a TEI document containing a schema specification is called an ODD (One Document Does it all)

4.17. The TEI modules

analysis Simple analytic mechanisms
certainty Certainty and uncertainty
core Elements common to all TEI documents
corpus Header extensions for corpus texts
declarefs Feature system declarations
dictionaries Printed dictionaries
drama Performance texts
figures Tables, formulae, and figures
gaiji Character and glyph documentation
header The TEI Header
iso-fs Feature structures
linking Linking, segmentation and alignment
msdescription Manuscript Description
namesdates Names and dates
nets Graphs, networks and trees
spoken Transcribed Speech
tagdocs Documentation of TEI modules
tei Declarations for datatypes, classes, and macros available to all TEI modules
textcrit Text criticism
textstructure Default text structure
transcr Transcription of primary sources
verse Verse structures

4.18. Using the TEI Class System

When defining a new element, we need to consider
  • its name and description
  • what attributes it can carry
  • what it can contain
  • where it can appear in a document

The TEI class system helps us answer all these questions (except the first).

4.19. Attribute Classes

  • Attribute classes are given (usually adjectival) names beginning with att.; e.g. att.naming, att.typed
  • all members of att.naming inherit from it attributes key and ref; all members of att.typed inherit from it type and subtype
  • If we want an element to carry the type attribute, therefore, we add the element to the att.typed class, rather than define those attributes explicitly.

4.20. A very important attribute class: att.global

All TEI elements are declared to be a member of att.global; this class provides, among others:
xml:id
a unique identifier
xml:lang
the language of the element content
n
a number or name for an element
rend
how the element in question was rendered or presented in the source text.

4.21. Model Classes

  • Model classes contain groups of elements which are allowed in the same place. e.g. if you are adding an element which is wanted wherever the <bibl> is allowed, add it to the model.biblLike class
  • Model classes are usually named with a Like or Part suffix:
    • members of model.pLike are all things which ‘behave like’ paragraphs, and are permitted in the same places as paragraphs
    • members of model.pPart are all things which can appear within paragraphs. This class is subdivided into
      • model.pPart.edit elements for simple editorial intervention such as <corr>, <del> etc.
      • model.pPart.data‘data-like’ elements such as <name>, <num>, <date> etc.
      • model.pPart.msdesc extra elements for manuscript description such as <seal> or <origPlace>

4.22. Basic Model Class Structure

Simplifying wildly, one may say that the TEI recognises three kinds of element:
divisions
high level major divisions of texts
chunks
elements such as paragraphs appearing within texts or divisions, but not other chunks
phrase-level elements
elements such as highlighted phrases which can occur only within chunks
There are ‘base model classes’ corresponding with each of these, and also with the following groupings: three:
inter-level elements
elements such as lists which can appear either in or between chunks
components
elements which can appear directly within texts or text divisions

And yes, there is a class model.global for elements that can appear anywhere — at any hierarchic level.

4.23. Specifying a class

The <classSpec> element is used to declare a class. Its type attribute indicates whether this is an attribute or a model class

For a model class, the class specification is purely documentary. For an attribute class it contains an <attList>, which specifies the attributes it provides.

Elements are classified (i.e. classes are referenced) by means of the <memberOf> child of the <classes> element inside an <elementSpec> (and Classes can also be members-of other classes

<classSpec ident="model.footype="model">
 <desc>The foo class consists solely of elements with silly names made up for didactic
   purposes</desc>
</classSpec>
<classSpec ident="att.footype="atts">
 <desc>The foo class provides the attribute <att>bar</att>
 </desc>
 <attList>
  <attDef ident="bar">
<!-- ... -->
  </attDef>
 </attList>
</classSpec>

4.24. Picking and choosing (1)

You can specify elements to be excluded from those provided by a module:
<schemaSpec start="TEIident="testSchema-4a">
 <moduleRef key="coreexcept="mentioned quote said"/>
 <moduleRef key="header"/>
 <moduleRef key="textstructure"/>
</schemaSpec>
This is equivalent to the following:
<schemaSpec start="TEIident="testSchema-4b">
 <moduleRef key="core"/>
 <moduleRef key="header"/>
 <moduleRef key="textstructure"/>
 <elementSpec ident="mentionedmode="delete"/>
 <elementSpec ident="quotemode="delete"/>
 <elementSpec ident="saidmode="delete"/>
</schemaSpec>

The mode parameter instructs an ODD processor how to resolve multiple declarations.

4.25. Picking and choosing (2)

You can specify just the elements you want to include:
<schemaSpec start="TEIident="testSchema-4b">
 <moduleRef key="core"/>
 <moduleRef key="header"/>
 <moduleRef key="textstructureinclude="body div"/>
</schemaSpec>
This is equivalent to the following:
<schemaSpec start="TEIident="testSchema-4b">
 <moduleRef key="core"/>
 <moduleRef key="header"/>
 <elementRef key="div"/>
 <elementRef key="body"/>
</schemaSpec>

(Sadly not yet fully implemented in web Roma)

4.26. Unifying multiple declarations

As noted above, the mode attribute controls what an ODD processor should do when it find multiple instances of some component.

Supposing that we have found one existing declaration, what should be done with a subsequent one for the same object?

mode value existing declaration effect
add no add new declaration to schema; process its children in add mode
add yes raise error
replace no raise error
replace yes retain existing declaration; process new children in replace mode; ignore existing children
change no raise error
change yes process identifiable children according to their modes; process unidentifiable children in replace mode; retain existing children where no replacement or change is provided
delete no raise error
delete yes ignore existing declaration and its children

4.27. Specifying elements and modules

The ‘*-spec’ elements are all members of a class att.identifiable which provides an attribute ident that is used (rather than xml:id) as a unique identifier for them.

To reference such a declaration, we use the key attribute:
<elementRef key="bar"/>
<!-- implies the presence elsewhere of ... -->
<elementSpec ident="bar">
<!-- .... -->
</elementSpec>
Similarly:
<moduleRef key="foo"/>
<!-- implies the presence elsewhere of ... -->
<moduleSpec ident="foo"/>

4.28. Elements in modules

But note that elements indicate the module they belong to by means of their module attribute:
<elementSpec ident="barmodule="foo">.... </elementSpec>
so the <moduleSpec> is largely documentary.

Elements not declared by the TEI can be assigned to a user-define module; its name is defaulted.

4.29. Specification of attributes

For reasons lost in the mists of time, the element <attSpec> is actually spelled <attDef>, but otherwise, it's just the same. Within an <elementSpec> or a <classSpec>, you can supply an <attList> containing of bunch of <attDef> elements, each with an ident:
<attList>
 <attDef ident="bax">....</attDef>
</attList>

4.30. Specifying value lists and datatypes

In general, the legal values for an attribute are defined by means of a <datatype> element, see later.

A common case, however, is to supply an enumeration (a list, open or closed, of legal values. This is done using the <valList> element, which groups a bunch of identifiable <valItem> elements: like this
<attDef ident="status">
 <desc>indicates the state of the system using a predefined set of colour
   codes</desc>
 <defaultVal>green</defaultVal>
 <valList type="closed">
  <valItem ident="red">
   <desc>all systems shut down</desc>
  </valItem>
  <valItem ident="orange">
   <desc>systems shut-down imminent</desc>
  </valItem>
  <valItem ident="green">
   <desc>system status normal</desc>
  </valItem>
  <valItem ident="white">
   <desc>system status
       unrecorded</desc>
  </valItem>
 </valList>
</attDef>

4.31. Datatypes

Typically used to constrain attribute values:
<attDef ident="status">
 <datatype>
  <rng:ref name="data.enumerated"/>
 </datatype>
<!-- ... implies that a vallist is supplied -->
</attDef>
<attDef ident="lastUpdated">
 <datatype>
  <rng:ref name="data.temporalExpr.w3c"/>
 </datatype>
</attDef>

TEI defined datatypes are actually patterns, defined by a <macroSpec>

4.32. Specifying a pattern

The <macroSpec> element is an identifiable element used to associate a name with any string. It has two typical uses in the TEI scheme:
  • defining common content models
  • defining TEI-specific datatypes
<macroSpec ident="data.foo">
 <desc>a new datatype i just invented</desc>
 <content>
<!-- RELAXNG pattern defining the datatype -->
 </content>
</macroSpec>
<macroSpec ident="macro.foo">
 <desc>a content model I plan to reuse often</desc>
 <content>
<!-- RELAXNG pattern defining the content model -->
 </content>
</macroSpec>

4.33. W3C datatypes in RELAXNG

The macro data.numeric looks like this
<rng:choice>
 <rng:data type="double"/>
 <rng:data type="token">
  <rng:param name="pattern">(\-?[\d]+/\-?[\d]+)</rng:param>
 </rng:data>
 <rng:data type="decimal"/>
</rng:choice>

For that type attribute on <data> you can also use any of string, boolean, decimal, float, double, duration, dateTime, time, date, gYearMonth, gYear, gMonthDay, gDay, gMonth, or anyURI

Or you narrow down the definition with a <param>, eg a regular expression.

http://relaxng.org/

4.34. Schematron constraints

  • An element specification can also contain a <constraintSpec> element which contains rules about its content expressed as ISO Schematron constraints. e.g.:
<elementSpec ident="divmode="change"   xmlns:xsl="http://www.w3.org/1999/XSL/Transform"   xmlns:s="http://purl.oclc.org/dsdl/schematron">
 <constraintSpec ident="cartoonscheme="isoschematron">
  <constraint>
   <assert xmlns="http://purl.oclc.org/dsdl/schematron"
    test="@type='cartoon' and .//tei:graphic">
a cartoon must include a
       graphic </assert>
  </constraint>
 </constraintSpec>
</elementSpec>
However...
  • You can only add such rules by editing your ODD file: Roma doesn't know about them.
  • Not all schema languages can implement these constraints.

4.35. A <constraintSpec> in TEI ODD

The rule applies to the context of the element in which it is defined.

  • It must have a scheme to identify the constraint language (isoschematron)
  • It must have a unique identifier
  • It contains one or more <constraint>
  • Each <constraint> has an <assert> or <report> in the http://purl.oclc.org/dsdl/schematron namespace
  • The test attribute is an XPath expression. The prefix tei is defined in the TEI for you

4.36. Writing Schematron XPath expressions

  • The <assert> element prints its body text if the expression resolves to false
  • The <report> element prints its body text if the expression resolves to true
  • You can use <name/> in the message text, to give the context, but not other markup

There are other Schematron facilities to help give more useful reports, but the XPath expression is the key tool.

More details at http://www.schematron.com/

4.37. Using the Schematron rules

You have various ways of using the rules:
  1. Ask Roma to extract the Schematron rules into a file, and compile that into XSLT
  2. Ask oXygen to use the Schematron embedded in a RELAX NG schema:

4.38. Amongst the sort of things you can check with Schematron

  • Co-occurrence constraints: ‘if there is an attribute X, there must also be a Y’
  • Contextual counting: ‘there can only be one <title> child of a <titleStmt>
  • Text content: ‘The word SECRET cannot appear in an author name’
  • Contextual constraint: ‘Words in English (xml:lang='en') cannot occur inside Latin phrases (xml:lang='la')’
  • Referential integrity: ‘a pointer URL starting with a # must have a corresponding xml:id somewhere in the document’

4.39. Copying the Schematron approach

You may prefer to write a simple XSLT of your own to test your document:

<xsl:template match="q"   xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:if test="count(ancestor-or-self::q)>3">
  <xsl:message>Quotes nested 3 deep?
     really????</xsl:message>
 </xsl:if>
</xsl:template>

5. TEI stylesheets

The family of XSL stylesheets which are delivered with the TEI on Sourceforge have three functions:
  • implementation of the ODD schema meta-language, ie providing tools to take an ODD customization of the TEI and generate the appropriate schemas and documentation (what the Roma web service does)
  • plausible rendering to HTML, XSL FO, Word, Open Office, ePub and LaTeX of typical born-digital TEI documents
  • vehicle for managing other conversions in and out of TEI (Word, Open Office, ePub, Docbook, etc)
The code is only developed in XSLT 2.0 for conformant TEI P5.
The stylesheet family is available
  • within oXygen as default transformation for TEI documents
  • as downloadable Debian/Ubuntu packages
  • inside the online OxGarage converter
In oXygen, the conversions to and from word-processor formats are available from within the editor.

5.1. Why should you use this library of stylesheets, rather than roll your own?

  • They solve problems you may not have thought of you (generating the right links when making multiple output files)
  • They cover a variety of output formats you may need one day
  • It is better to co-operate on an open-source project than reinvent too many wheels
  • They are already widely packaged and distributed

5.2. Limitations

These stylesheets only do what were designed to do!
  • They do not provide a rendering of all TEI elements
  • They do not implement all possible values of every rend attribute
  • The different output formats are not always in sync, or give the same result
but they do deal with quite a few common problems.

5.3. Output assumptions

The stylesheets attempt to work in the same way with each of the supported output formats, but note:
  • The HTML output is designed to work with an associated CSS stylesheet, which takes care of much of the detailed spacing and font work; however, the HTML is in charge of features such as the numbering of sections.
  • The LaTeX output is designed for people who understand how to use existing LaTeX packages and classes; it therefore tries to produce reasonably readable TeX markup, with high-level commands whose effects will be determined by LaTeX (including numbering and spacing).
  • The XSL FO output produces a very detailed specification of the output layout, with all the details of fonts, numbering, vertical and horizontal spacing specified in situ. The FO processor is only responsible for line and page breaking, and hyphenation.

5.4. How the stylesheets work

Fundamental XSLT to understand
  • import and include
  • top-level parameters and variables
  • named templates and hooks

5.5. Feature: import and include

<xsl:import href="...">: include a file of XSLT templates, overriding them as needed

<xsl:include href="...">: include a file of XSLT templates, but do not override them

If you want to pull in a file which has the same template as the current file:
  • if you use <import>, the one in the current template has a higher priority
  • if you use <include>, you will get an error, unless you manually assign a higher priority to one or the other

5.6. Top-level <param> and <variable>

  • You can declare variables directly as children of <stylesheet>
    <xsl:variable name="TEI">Text Encoding
    Initiative</xsl:variable>
    as a convenience
  • You can also declare parameters directly as children of <stylesheet>, but these can be overridden when the stylesheet is called:
    <xsl:param name="logo">../Graphics/logo</xsl:param>

5.7. <import> example

<xsl:import
  href="/usr/share/xml/tei/stylesheet/slides2/teihtml-slides.xsl"/>

<xsl:param name="logoFile">../Graphics/logo.png</xsl:param>
<xsl:param name="cssFile">teislides.css</xsl:param>
<xsl:param name="showNamespaceDecls">false</xsl:param>
<xsl:param name="forceWrap">true</xsl:param>
<xsl:param name="spaceCharacter"> </xsl:param>
<xsl:template name="lineBreak">
 <xsl:param name="id"/>
 <br/>
</xsl:template>

5.8. Feature: named template

Often, it is convenient to store common code in a named template for re-use or to make the code more readable:
<xsl:template match="div1|div2mode="toc">
 <xsl:call-template name="header"/>
 <xsl:apply-templates/>
</xsl:template>
<xsl:template name="header">
 <li>
  <xsl:number level="multiple"/>
  <xsl:text/>
  <xsl:value-of select="head"/>
 </li>
</xsl:template>

5.9. Parameters to templates

You can also pass parameters to templates:
<xsl:template match="div">
 <xsl:call-template name="toc">
  <xsl:with-param name="text">
   <xsl:value-of select="head"/>
  </xsl:with-param>
 </xsl:call-template>
</xsl:template>
<xsl:template match="person">
 <xsl:call-template name="toc">
  <xsl:with-param name="text">
   <xsl:value-of select="surname"/>
   <xsl:text>, </xsl:text>
   <xsl:value-of select="forename"/>
  </xsl:with-param>
 </xsl:call-template>
</xsl:template>
<xsl:template name="toc">
 <xsl:param name="text"/>
 <li>
  <xsl:value-of select="$text"/>
 </li>
</xsl:template>

5.10. TEI Stylesheet family top-level layout

Directories for output formats

docx Converting to and from Word OOXML
epub Converting to ePub
fo2 Making XSL FO
latex2 Making LaTeX
nlm Converting from NLM
odds2 Transforming TEI ODD specifications
odt Converting to and from OpenOffice Writer
slides2 Making slides (HTML and PDF)
tite Converting from TEI Tite
xhtml2 Making HTML

Special directories

profiles Customizations
common2 Templates for any output format
tools2 Utilities

5.11. Content of HTML (or LaTeX or XSL FO) directory

core.xsl Basic TEI elements
dictionaries.xsl Dictionaries module
drama.xsl Drama module
figures.xsl Figures and tables module
header.xsl Header module
linking.xsl Linking module
namesdates.xsl Names and Dates module
tagdocs.xsl Processing ODDs
tei-param.xsl Parameters
tei.xsl Top-level wrapper
textcrit.xsl Text critical module
textstructure.xsl Basic structure
transcr.xsl Transcription module
verse.xsl Verse module

5.12. Layout of a profile directory

5.13. Profile conventions

  • in a directory hierarchy of the form name/format/from.xsl or name/format/to.xsl (indicating whether it is a conversion from or to the format)
  • known formats are: csv, docbook, docx, epub, fo, html, latex, oo, p4 and (special cases for ODD processing) lite, oddhtml, rdf, dtd, and relaxng.
  • references to the ‘master’ conversions should be in the form (eg)
    <xsl:import href="../../../epub/tei-to-epub.xsl"/>

5.14. Areas of customization (HTML)

  • Standard page features
  • Layout
  • Headings
  • Numbering
  • Output
  • Table of contents generation
  • Internationalization
  • CSS
  • Tables
  • Figures and graphics
  • Inline style

Remember that in HTML a lot will be done with CSS and JavaScript

5.15. Understanding the customization

There are six levels of interaction with the stylesheet family:
  1. setting parameters
  2. overriding templates provided for this purposed (listed in customization guide)
  3. writing templates which implement the empty ‘hooks’ (listed in the customization guide)
  4. adding new templates for elements not covered by the family
  5. providing complete replacements for low-level templates
Always make changes by overriding — never hack the originals!

5.16. Many parameters

There are dozens and dozens of parameters which affect the stylesheet output; you can set values for these by
  • specifying parameter names and values directly in oXygen
  • setting them on a command line
  • constructing a small local stylesheet which imports the public one, and adds overrides

5.17. Invoking an XSLT transform from oXygen

When you have loaded an XML file, look for the symbol in the menu and press it.

The first time, it will ask you which transformation scenario to use:

5.18. Simple result

5.19. Configuring the scenario in oXygen

Look for the symbol. This produces , asking if you want to change the setup. Choose yes, and you see .

5.20. Changing parameters in oXygen

Now you can supply values for parameters: .

5.21. Change pageLayout

5.22. 2 column display

5.23. Changing things around a bit

5.24. Using the a wrapper stylesheet

The simplest example of making a wrapper for the HTML stylesheets is:

<xsl:stylesheet version="2.0">
 <xsl:include
   href="http://www.tei-c.org/release/xml/tei/stylesheet/xhtml2/tei.xsl"/>

</xsl:stylesheet>

5.25. Using the a wrapper stylesheet (2)

Now you can build on it:

<xsl:stylesheet version="2.0">
 <xsl:include
   href="http://www.tei-c.org/release/xml/tei/stylesheet/xhtml2/tei.xsl"/>

 <xsl:param name="logoFile">../../logo.png</xsl:param>
 <xsl:param name="logoWidth">60</xsl:param>
 <xsl:param name="logoHeight">60</xsl:param>
 <xsl:param name="cssFile">myTEI.css</xsl:param>
 <xsl:param name="pageLayout">CSS</xsl:param>
 <xsl:param name="outputMethod">xml</xsl:param>
 <xsl:param name="parentWords">The Punch
   Project</xsl:param>
 <xsl:param name="institution">The University of
   Punch</xsl:param>
</xsl:stylesheet>

5.26. Using the a wrapper stylesheet (3)

And start to add your own templates:

<xsl:stylesheet version="2.0">
 <xsl:include
   href="http://www.tei-c.org/release/xml/tei/stylesheet/xhtml2/tei.xsl"/>

 <xsl:param name="logoFile">../../logo.png</xsl:param>
 <xsl:param name="logoWidth">60</xsl:param>
 <xsl:param name="logoHeight">60</xsl:param>
 <xsl:param name="cssFile">myTEI.css</xsl:param>
 <xsl:param name="pageLayout">CSS</xsl:param>
 <xsl:param name="outputMethod">xml</xsl:param>
 <xsl:param name="parentWords">The Punch
   Project</xsl:param>
 <xsl:param name="parentURL">http://tei.oucs.ox.ac.uk/Punch/</xsl:param>
 <xsl:param name="institution">The
   University of Punch</xsl:param>
 <xsl:template match="tei:hi[@rend='upsidedown']">
  <span class="upsidedown">
   <xsl:apply-templates/>
  </span>
 </xsl:template>
</xsl:stylesheet>

5.27. OxGarage

OxGarage is a web interface to the XSL stylesheets and its profiles: http://oxgarage.oucs.ox.ac.uk:8080/ege-webclient

OxGarage lets you:
  • generate schemas using the same tools as Roma
  • convert documentation to HTML, ePub, and DOCX
  • convert between TEI XML and Word DOCX
  • perform all the ODD tasks using web services
  • chain sets of transformations together

5.28. Key features of OxGarage

  • Built on EU-funded ENRICH project’s EGE for converting manuscript descriptions (University of Poznan)
  • Chained XSLT conversions
  • Uses TEI as pivot format
  • Read/write OpenOffice and Open XML
  • Provides route from Word to ePub
  • Supports “profiles” for variations

5.29. Matrix of OxGarage conversions

5.30. OxGarage web service example (1)

Process ODD to compiled ODD, then to TEI Lite, then to DOCX
curl -s -F upload=@test.odd -o test.docx http://oxgarage.oucs.ox.ac.uk:8080/ ege-webservice/Conversions/ ODD%3Atext%3Axml/ ODDC%3Atext%3Axml/ TEI%3Atext%3Axml/ docx%3Aapplication%3Avnd.openxmlformats-officedocument.wordprocessingml.document/

5.31. OxGarage web service example (2)

ODD to HTML, in French

curl -s -F upload=@test.odd -o test.html http://oxgarage.oucs.ox.ac.uk:8080/ege-webservice/Conversions/ ODD%3Atext%3Axml/ ODDC%3Atext%3Axml/ oddhtml%3Aapplication%3Axhtml%2Bxml/ ?properties=<conversions><conversion%20index='1'> <property%20id='oxgarage.lang'>fr</property></conversion></conversions>

6. Your problems here

  • 1. Add and replace text or mark-up in my transcription automatically in other ways than by the using the find-replace function.
  • 2. View transcription in different browsers
  • 3. View different levels of transcription in a browser
  • 4. Publish transcription online
  • 5. Convert transcription into PDF
  • 6. Publish PDF online
  • 7. X-Query: extract data out of my transcriptions and ms descriptions and export them into excel or some other programme that does statistical analysis. (eg. the percentage of abbreviations in a text)


Sebastian RahtzDirector (Research) of Academic ITUniversity of Oxford IT Services. Date: March 11th 2013
Copyright University of Oxford