Text only | Skip links
Skip links||IT Services, University of Oxford

1. TEI ODD under the bonnet

The TEI Guidelines are written in the ODD format

The source code for TEI P5 (available from http://www.tei-c.org/release/xml/tei/odd/Source/) contains :
  • 39 TEI-XML files, 25 of which correspond to a published chapter, most of them defining a module, for example PH-PrimarySources.xml
  • 778 TEI-XML files, each defining an element, a class, or a macro; more specifically:
    • 29 datatype macros (data.xxxx) for example data.sex.xml
    • 116 model classes (model.xxxx) for example model.biblLike.xml
    • 71 attribute classes (att.xxxx) for example att.divLike.xml
    • 8 general purpose macros (macro.xxxx) for example macro.phraseSeq.xml
    • 555 element specifications from ab.xml jusqu'à zone.xml

In this talk, we'll look at each these a bit more closely...

2. Physical organisation

Physical organization (as files) should not be confused with logical organization (as specifications etc)

  • The file guidelines-XX.xml is the ‘driver file’ for the version of the Guidelines in language XX.
  • It contains some preliminaries (a TEI Header, a title page, etc.), followed by several lines like this :
    <include xmlns="http://www.w3.org/2001/XInclude"
    href="Guidelines/en/HD-Header.xml"/>
    , one for each chapter
  • Within each chapter file there are similar xInclude statements for the objects declared by that chapter

3. Logical organization

  • At the end of each chapter that defines a module, there is a <moduleSpec> element which other specifications can reference
  • Specifications may also grouped together for convenience into <specGrp> elements, which can then be referenced as required by <specGrpRef> elements
  • Each <specGrp> contains the actual object specifications, again using xInclude
  • Each object specification, whether included in a <specGrp> or not, indicates the module to which it belongs

4. For example ...

This piece of generated text in the English Guidelines :

is produced from this ODD code :
<div>
 <head>Module for Tables, Formulæ, Notated Music, and Graphics</head>
 <p>The module described in this chapter provides the following features: <moduleSpec xml:id="DFTFFident="figures">
   <altIdent type="FPI">Tables, Formulæ, Notated Music, Figures</altIdent>
   <desc>Tables, formulæ, notated music, and figures</desc>
   <desc xml:lang="fr">Tableaux, formules et graphiques</desc>
   <desc xml:lang="zh-TW">表格、方程式與圖表</desc>
<!-- ... -->
  </moduleSpec> The selection and combination of modules to form a TEI schema is
   described in <ptr target="#STIN"/>. <specGrpRef target="#DFTTAB"/>
  <specGrpRef target="#DFTFOR"/>
  <specGrpRef target="#DFTNTM"/>
  <specGrpRef target="#DFTGRA"/>
 </p>
</div>

5. The pointers lead to things like this

<specGrp xml:id="DFTTAB"
n="Tables"> <include xmlns="http://www.w3.org/2001/XInclude"
href="../../Specs/table.xml"/> <include xmlns="http://www.w3.org/2001/XInclude"
href="../../Specs/row.xml"/> <include xmlns="http://www.w3.org/2001/XInclude"
href="../../Specs/cell.xml"/> </specGrp>

and the xInclude brings in an actual specification, for example:

<elementSpec module="figuresident="cell">
 <gloss versionDate="2007-06-12xml:lang="fr">cellule</gloss>
 <desc versionDate="2005-01-14xml:lang="en">contains one cell of a table.</desc>
<!-- ... -->
 <listRef>
  <ptr target="#FTTAB1type="div1"/>
 </listRef>
</elementSpec>

6. What sort of "objects" are specified?

  • datatypes
  • model classes
  • attribute classes
  • macros
  • ...and elements

We'll take a closer look at an example of each...

7. An ODD specification

Open data.sex.xml with oXygen

Like any other TEI specification file ...
  • It's an XML document which can be validated against the schema file specified
  • It has two open source licences
  • It has <desc> elements and comments (<remarks>), repeated in several languages, each with an xml:lang and a versionDate
  • It has an identifier, supplied by the ident attribute
  • It belongs to the module named by the module attribute
  • It has some additional commentary tagged <remarks>
  • It has cross references to parts of the Guidelines where it is discussed, grouped within a <listRef> element.

8. A datatype specification

  • Datatypes are currently represented as macros of a specific type. This may change.
  • The <content> element maps this datatype to another, in this case data.word
  • Ultimately, all TEI datatypes map to one or more of
    • a specific token or pattern
    • an XSD base datatype
  • The data.word macro (used throughout P5) is defined by a pattern which matches only letters, digits, punctuation characters, or symbols, and which cannot include whitespace.

9. A model class specification

Open model.biblLike.xml with oXygen

  • A model class specification exists only in order to be pointed at by other specifications, so not much to see here.
  • As well as the items we saw before, notice that there is a <listRef> at the end, providing cross references to other parts of the Guidelines where this class is discussed
  • The <classes> element is used (both here and elsewhere) to show classes to which this class belongs. Each <memberOf> element references a class of which this one is a specialisation.
  • To see more clearly the hierarchy of classes, look at the way this class is presented in the documentation: http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-model.biblLike.html

10. An attribute class specification

Open att.divLike.xml with oXygen

  • To see the structure of this spec more clearly, open the Outline view (Window -> Show View-> Outline) if necessary)
  • The list of attributes inherited by members of this class is supplied by an <attList>, containing two <attDef> elements (org et uniform)
  • The possible values for an attribute are specified by its <datatype>
  • Where they can be enumerated, attribute values are documented by a <valList>, either open or closed, supplying a description for each possible value
  • This class is a member of two others (att.metrical, and att.fragmentable) : an element which is a member of this class will therefore also inherit the attributes defined by those two classes, if they are available

11. A macro specification

Open macro.phraseSeq.xml with oXygen

  • Some elements use a named macro to define common content models.
  • In the current version of P5, all macros express their replacement values in RELAX NG syntax. This may change.

12. An element specification

Open abstract.xml with oXygen

This spec was created comparatively recently (2012-12-27) and so its <desc> has yet to be translated. Check that you recognise and understand its main components:

  • The <elementSpec> has attributes module and ident identifying its module and its canonical name
  • The <classes> element specifies to which classes it belongs
  • The <content> element specifies its possible conmtent (in this version, in RELAX NG, rather than Pure ODD)
  • The <exemplum> element contains a usage example
  • Additional comments (<remarks>) and cross references (<listRef>)

13. TEI classes

You need to understand the TEI class system to understand how ODDs work...

  • An attribute class, named something like att.global, provides attributes
  • A model class, named something like model.profileDescPart, identifies a set of elements which have similar semantics, or which can appear in the same contexts: :
    • model.xxxLike : the class of elements LIKE an xxx
    • model.xxxPart : the class of elements that can be PART OF an xxx
  • one class can inherit properties from another
  • model classes are particularly useful because they provide a means of indirectly specifying content models

14. For example

<classes>
 <memberOf key="att.global"/>
 <memberOf key="att.responsibility"/>
 <memberOf key="model.profileDescPart"/>
</classes>

This tells us that...

  • <abstract> is a member of the model.profileDescPart class
  • and can therefore appear inside <profileDesc>, the content of which is defined as
    <classRef key="model.profileDescPartmaxOccurs="unbounded"/>
  • as a member of att.responsibility, <abstract> inherits attributes cert and resp
  • because att.responsibility is itself a subclass of the class att.source, the attribute source is also available.

15. Defining a content model

  • The current source of TEI P5 uses the language RELAX NG to define content models (earlier versions, used SGML DTD language)
  • As you have seen, ODD also provides ways of defining content models without recourse to another language
  • Pure ODD has been part of TEI P5 since release 2.6.0 (Dec 2013), but is still in beta test.
  • Pure ODD is more expressive than DTD or XSD

16. Purifying a content model

<content>
 <rng:oneOrMore>
  <rng:choice>
   <rng:ref name="model.pLike"/>
   <rng:ref name="model.listLike"/>
  </rng:choice>
 </rng:oneOrMore>
</content>
<content>
 <alternate maxOccurs="unbounded">
  <classRef key="model.pLike"/>
  <classRef key="model.listLike"/>
 </alternate>
</content>

See Burnard and Rahtz 2013 for more details and rationale...

17. What is a class reference?

When a <classRef> appears inside <content>, its meaning is determined by the value of the expand attribute

If the class being referenced has 3 members x, y, z...

value of expand meaning
alternate (default) (x | y | z)
sequence (x, y, z)
sequenceOptional (x?, y?, z?)
sequenceOptionalRepeatable (x*, y*, z*)
sequenceRepeatable (x+, y+, z+)

The attributes minOccurs and maxOccurs are also available; they control occurrence of the whole thing

You can also suppress or select some of the the members of a class, using attributes include and except.

18. Schematron constraints

  • An element spec may also include one or more <constraintSpec> elements, which contain additional constraints of any kind, expressed in the ISO Schematron language
  • In TEI we use these to express additional semantic or co-occurrence constraints that cannot be expressed in any schema language
  • Not all XML processing systems take notice of these (but oXygen does).
  • They are also useful when implementing Pure ODD constructs that cannot be expressed in the target schema language

19. Schematron example

Open span.xml with oXygen

  • this spec has several <constraintSpec> elements, each of which has an ident and a scheme
  • one or more <constraint> elements can be supplied
  • each constraint is expressed in the namespace appropriate to the scheme indicated (here ISO schematron)
  • For example. if @to is supplied on <name/>, @from must be supplied as well (NB "<name/>" is the name of the element which triggered this rule)

You'll see more examples of this later...

20. ODD is also a customization language

It is used both to specify your choices from the TEI framework, and to specify the framework itself

A customization ODD specifies a selection or modification of the objects provided by another ODD, typically (but not necessarily) some release of the whole TEI framework by
  • selecting some modules
  • selecting or deleting some of the objects defined by those modules (éléments, classes, datatypes, macros)
  • selecting or deleting some attributes
  • modifying or replacing some parts of the original specification (e.g. a valList or an example)
  • possibly adding entirely new objects

A customization ODD will often contain multiple specifications for the same object: one original, and one modified. ODD processing must unify these, following rules we will explain later.

21. A simple customization example

As you've already seen, we use the <schemaSpec> element to specify a schema, either a completely new one, or a customization.

  • The mandatory ident attribute gives a name for the schema
  • The start attribute indicates the name or names of the root element/s of the schema
  • The source attribute identifies the location of the TEI source being customized (this might be a specific version of TEI P5, or an existing customization ODD)
  • The docLang and targetLang attributes can be used to select the language to be used for element descriptions and element names respectively, where translations are available
<schemaSpec
  start="TEI"
  ident="testschema"
  source="tei:1.5.0"
  docLang="fr">

<!-- declarations -->
</schemaSpec>

22. Choosing by exclusion

You can specify just the elements you want to exclude, and take all the rest:
<schemaSpec start="TEIident="testschema">
 <moduleRef key="coreexcept="mentioned quote said"/>
 <moduleRef key="header"/>
 <moduleRef key="textstructure"/>
</schemaSpec>
also expressible as:
<schemaSpec start="TEIident="testschema">
 <moduleRef key="core"/>
 <moduleRef key="header"/>
 <moduleRef key="textstructure"/>
 <elementSpec ident="mentionedmode="delete"/>
 <elementSpec ident="quotemode="delete"/>
 <elementSpec ident="saidmode="delete"/>
</schemaSpec>

23. Choosing by inclusion

You can specify just the elements you want to include, and suppress all the rest :
<schemaSpec start="TEIident="testschema">
 <moduleRef key="core"/>
 <moduleRef key="header"/>
 <moduleRef key="textstructureinclude="body div"/>
</schemaSpec>
also expressible as :
<schemaSpec start="TEIident="testschema">
 <moduleRef key="core"/>
 <moduleRef key="header"/>
 <elementRef key="div"/>
 <elementRef key="body"/>
</schemaSpec>

Take care ! a module can define classes or macros as well as elements. include and except apply only to elements!

24. First customization exercise

Let's try to make a really simple schema called featherLite which might be used to markup a linguistic corpus.

  • The TEI header will contain only the minimum required for TEI conformance (specifically: <teiHeader ><fileDesc> <titleStmt> <publicationStmt> <sourceDesc> and <title>)
  • The <text> element will contain just a <body>, composed of <div>s containing <ab>s
  • The <ab> elements will contain <s> elements composed of <w> or <pc> elements.
  • The only attributes we want are xml:id, xml:lang, type, and subtype

You'll do an exercise like this in the next session. For the moment, think about how you might write the <schemaSpec> you'll need.

25. First customization exercise (contd.)

  • Open the file feather-1.odd with oXygen and check that you understand what it is doing
  • Use oXygen to generate a schema from it in your favourite schema language.
  • Create a new TEI XML document using this schema. Check what elements and attributes the schema makes available.

26. P.S. A word on TEI conformance.

What does it mean to be TEI-conformant?

  • you must be honest : XML elements in the TEI namespace must respect the TEI-defined semantics
  • you must be explicit : Supplying an ODD is the best way of showing exactly what modifications you made in your TEI customization
More formally...
  • Elements from the TEI namespace must be valid with respect to the TEI-ALL schema
  • Any extension of an existing TEI element should therefore be placed in a different namespace

These rules are intended to simplify the "blind interchange" of documents. But they don't guarantee it. .



Lou Burnard Consulting. Date:
Copyright University of Oxford