Text only | Skip links
Skip links||IT Services, University of Oxford

1. Following the TEI spirit

Conformance to the TEI means:
  • Sharing a common text encoding culture
  • Sharing the same vocabulary (when applicable)
  • Allowing user autonomy in defining modifications (extensions, customization), but sharing the mechanisms to do so
The TEI gives you a lot of help in following these rules.

2. Important concepts

The TEI's literary programming with ODD (One Document Does it all) provides:
  • Schema specification
  • User oriented documentation
  • Modularity: all specifications pertaining to a coherent sub-domain of the TEI
  • Classes: identifying shared behaviours or semantics
  • Extensibility: a consequence of the above mechanisms

3. The TEI ODD in practice

The TEI Guidelines, its schema, and its schema fragments, are all produced from a single XML resource containing:
  1. Descriptive prose (lots of it)
  2. Examples of usage (plenty)
  3. Formal declarations for components of the TEI Abstract Model:
    • elements and attributes
    • modules
    • classes and macros

4. Possibilities of customizing the TEI

The TEI has over 20 modules. A working project will:
  • Choose the modules they need
  • Probably narrow the set of elements within a module
  • Probably add local datatype constraints
  • Possibly add new elements
  • Possibly localize the names of elements

5. Real life TEI customization

We aim to support a range of interactions with the TEI:
Easy TEI
Simple access to the TEI through Roma
Subsetting the TEI
Making the TEI even easier to use
Enlarging the application profile
Using modules
Modifying the TEI objects
First insights into extensibility
Behind the scene - ODD
Starting to use the actual specification language

6. Quick and simple access to the TEI

Imagine that you have seen your colleague next door doing some encoding with the TEI and want to do the same thing:
  • Go to Roma at http://tei.oucs.ox.ac.uk/Roma/
  • Toy with the user profile [ Customize ]
  • Generate a schema [Schema]
  • Make a trial with the editor, creating a simple document
  • Get back to Roma and make basic documentation

7. Roma:Start

8. Roma:Schema Select

9. Roma:Generate Doc

10. Subsetting the TEI

Suppose you now feel you want to use some more of the TEI, but not all of it

  • Go to Roma…
  • Look at [Modules]
  • Explore default modules by pointing to main elements (by order of interest). You can throw away most things, but
    • In textstructure, you should really keep <TEI>, <text>, <body> and <div>
    • In core, most people need <p>, <q>, <list>, <pb> and <head>
    • From header, keep everything unless you really understand the details
  • Start checking out elements
  • Make editorial choices (numbered vs. unnumbered head’s)

11. Roma:Customize

12. Roma:Modules

13. Roma:Change Module

14. A word of caution

  • The TEI is not a monolithic environment
  • Very few things are really mandatory …
  • …but the TEI is more than just a market place
  • Basic document structure must be preserved
The TEI is a powerful environment for working with elements and producing documentation, but do not abuse it.

15. Modifying TEI objects

Understanding classes is critical.
  • They group together elements with the same role in the TEI architecture
  • They group together elements with the same syntactic behaviour
  • Classes can provide attributes for groups of like-minded elements
  • The elements in the class will appear in the same content models
The class defines a group of elements belonging to the same family of concepts, elements declare themselves as belonging to a class.

16. Roma:Show Class

17. Roma:Change Class Attributes

18. Roma:Edit Class Attributes

19. Adding TEI objects

You can add your own elements and attributes. But
  • make very sure you are not just making something which is syntactic sugar for an existing TEI concept
  • do not rename existing elements — you can do that directly in ODD
  • if you want facilities from a very different field of discourse, such as maths or vector graphics, use the existing standards in that area
  • consider interoperability

20. Roma: Add Element

21. Roma:New Attribute

22. Roma:Language Selection

23. Roma:In French

24. Under the hood

TEI customizations are themselves expressed in TEI XML, using elements from the tagdocs module.

For example:
<schemaSpec ident="myTEIlite">
 <desc>This is TEI Lite with simplified heads</desc>
 <moduleRef key="tei"/>
 <moduleRef key="core"/>
 <moduleRef key="textstructure"/>
 <moduleRef key="header"/>
 <moduleRef key="linking"/>
 <elementSpec ident="headmode="change">
produces something like TEI Lite, with a slight change

25. ODD processors

  • The TEI maintains a library of XSLT scripts that can generate
    • The TEI Guidelines in canonical TEI XML format
    • The Guidelines in HTML or PDF
    • RELAXNG, DTD, or W3C schema fragments
  • The same library is used by the customization layer to generate
    • project-specific documentation
    • project-specific schemas
    • translations into other (human) languages
  • We use eXist as a database for extracting material from the P5 sources

26. The TEI abstract model

  • The TEI abstract model sees a markup scheme (a schema) as consisting of a number of discrete modules, which can be combined more or less as required.
  • A schema is made by combining references to modules and optional element over-rides or additions
  • Each element declares the module it belongs to: elements cannot appear in more than one module.
  • Each module extends the range of elements and attributes available by adding new members to existing classes of elements, or by defining new classes.

27. Expression of TEI content models

Within the class system, TEI elements have to be defined using some language notation; choices include:
  1. using XML DTD language (as in older versions of the TEI)
  2. using W3C Schema language
  3. using the RELAXNG schema language
  4. inventing an entirely new abstract language for later transformation to specific schema language
We chose a combination of 3 and 4 — using our abstract language, but switching to RELAXNG for content modelling.

28. Why that combination?

  • Expressing constraints in XML language is too attractive to forego
  • There is a clamour for better datatyping than DTDs have
  • The schema languages are so good, it is silly to reinvent them
  • But we like our class system and literate programming

29. DTD vs RELAXNG vs W3C Schema

  • DTDs are not XML, and need specialist software
  • W3C schema is not consistently implemented, its documentation is vast and confusing, and it looks over-complex
  • RELAXNG on the other hand…
    • uncluttered design
    • good documentation
    • multiple open source 100%-complete implementations
    • ISO standard
    • useful features for multipurpose structural validation
No contest…

30. What does an ODD look like?

<elementSpec module="spokenident="pause">
  <memberOf key="model.divPart.spoken"/>
  <memberOf key="att.timed"/>
  <memberOf key="att.typed"/>
  <attDef ident="whousage="opt">
   <gloss>A unique identifier</gloss>
   <desc>supplies the identifier of the
       person or group pausing.
       Its value is the identifier of a <gi>person</gi>
       or <gi>persGrp</gi> element in the TEI header.
    <rng:ref name="data.pointer"/>
 <desc>a pause either between or within utterances.</desc>

31. ... from which we generate

element pause { pause.content, pause.attributes }
pause.content = empty
pause.attributes =
model.divPart.spoken |= pause
att.timed |= pause
att.typed |= pause
att.ascribed |= pause

32. .. or

<!ELEMENT %n.pause; %om.RR; EMPTY> <!ATTLIST %n.pause; %att.global.attributes; %att.timed.attributes; %att.typed.attributes; %att.ascribed.attributes;> <!ENTITY % model.divPart.spoken "%x.model.divPart.spoken; %n.event; | %n.kinesic; | %n.pause; | %n.shift; | %n.u; | %n.vocal; | %n.writing;">

33. ... and, indeed

34. A more complex example

<elementSpec module="corpusident="birth">
 <gloss>Birth details</gloss>
 <desc>contains information about a person's birth,
   such as its date and place.</desc>
  <memberOf key="model.personPart"/>
  <rng:ref name="macro.phraseSeq"/>
  <attDef ident="dateusage="opt">
   <desc>specifies the date of birth in an ISO
       standard form (yyyy-mm-dd).</desc>
    <rng:ref name="data.temporal"/>

35. Which produces …

36. And some XSD for a change …

<xsd:element name="birth">
  <xsd:documentation>(Birth details) contains information
     about a person's birth, such as its date
     and place.</xsd:documentation></xsd:annotation>
   <xsd:extension base="birth.content">
    <xsd:attributeGroup ref="ns1:birth.attributes"/></xsd:extension></xsd:complexContent></xsd:complexType></xsd:element>
<xsd:complexType name="birth.content">
  <xsd:extension base="macro.phraseSeq"/></xsd:complexContent></xsd:complexType>

37. Adding a new element

 <moduleRef key="header"/>
 <moduleRef key="core"/>
 <moduleRef key="text"/>
 <moduleRef key="textstructure"/>
 <elementSpec ident="soundClipmode="add">
   <memberOf key="model.data"/>
   <attDef ident="location">
    <desc>supplies the location of the clip</desc>
     <rng:ref name="data.pointer"/>
  <desc>includes an audio object in a document.</desc>

38. Uniformity of description

  • modules, elements, attributes, value-lists are treated uniformly
  • each has an identifier, a gloss, a description, and one or more equivalents
  • each can be added, changed, replaced, deleted within a given context
  • for example, membership in the att.type class gives you a generic type attribute, which can be over-ridden for specific class members

39. Overriding a value-list

<elementSpec ident="listmodule="core">
  <memberOf key="att.typed"/>
 <attDef ident="typemode="replace">
  <valList type="closed">
   <valItem ident="ordered">
    <gloss>Items are ordered</gloss>
   <valItem ident="bulleted">
    <gloss>Items are bulleted</gloss>
   <valItem ident="frabjous">
    <gloss>Items are frabjous</gloss>

40. Ontological mapping

The <equiv> element supplies a URI which identifies an equivalent concept (not a name) in some externally-defined ontology, e.g.
  • ISO data category registry
  • CIDOC conceptual reference model
  • Wordnet

Lou Burnard, Sebastian Rahtz, Laurent Romary. Date: February 2007
Copyright University of Oxford