Text only | Skip links
Skip links||IT Services, University of Oxford

Contents

1. Documentation Elements

Elements defined by the tagdocs module: altIdent att attDef attList attRef classSpec classes code content datatype defaultVal eg egXML elementSpec equiv exemplum gi ident listRef macroSpec memberOf moduleRef moduleSpec remarks schemaSpec specDesc specGrp specGrpRef specList stringVal tag val valDesc valItem valList

1.1. Documenting XML Markup Schemes

  • The 'Documentation Elements' chapter defines elements useful in describing XML, attributes, elements, and schemas
  • The TEI uses these elements in creation of the TEI Guidelines in its ODD (One Document Does-it-all) format
  • The documentation elements should be used any time one is describing XML elements
  • The ODD format, a TEI document using these elements in a particular way, can be used to customize the TEI to suit your needs

1.2. Multiple Outputs

An ODD Processor (something which reads a TEI ODD file and from that creates a number of outputs), may be used to produce:
  • formal reference documentation for elements, attributes, element classes, such as those provided in Appendix C of the TEI Guidelines
  • detailed descriptive prose documentation, embedding some parts of the formal reference documentation, such as the tag description lists provided in this and other chapters of the Guidelines
  • declarative code for one or more XML schema languages, specifically RELAX NG or W3C Schema
  • declarative code for fragments which can be assembled to make up an XML Document Type Declaration.

1.3. Following the TEI spirit

Using the TEI means:
  • Sharing a common text encoding culture
  • Sharing the same vocabulary (when applicable)
  • Allowing user autonomy in defining modifications (extensions, customization), but sharing the mechanisms to do so
The TEI gives you a lot of help in following these rules.

1.4. Phrase Level Documentation Elements

  • <code> (literal code from some formal language)
  • <ident> (an identifier for an object of some kind in a formal language)
  • <att> (the name of an attribute appearing within running text)
  • <val> (a single attribute value)
  • <gi> (the name (generic identifier) of an element.)
  • <tag> (text of a complete start- or end-tag, possibly including attribute specifications, but excluding the opening and closing markup delimiter characters)
  • <specList> (marks where a list of descriptions is to be inserted into the prose documentation)
  • <specDesc/> (a description of the specified element or class should be included at this point)

1.5. Specification Elements

  • <elementSpec> (documents the structure, content, and purpose of a single element type)
  • <classSpec> (reference information for an element class)
  • <macroSpec> (documents the function and implementation of a pattern)

1.6. Common Elements (1)

  • Description:
    • <remarks> (any commentary or discussion about the usage of an element, attribute, or class)
    • <listRef> (a list of significant references to places where this element is discussed)
  • Examples
    • <exemplum> (a single example demonstrating the use of an element)
    • <eg> (any kind of illustrative example)
    • <egXML> (a single well-formed XML example demonstrating the use of some XML element or attribute)
  • Classification
    • <classes> (the classes of which the element or class is a member)
    • <memberOf> (class membership of the parent element or class)

1.7. Common Elements (2)

  • Element Specifications
    • <content> (the text of a content model for the schema)
    • <attList> (documentation for all the attributes associated with this element, as a series of <attDef> elements)
  • Attributes
    • <attDef> (definition of a single attribute)
    • <datatype> (schema datatype for the attribute value)
    • <defaultVal> (default declared attribute value)
    • <valDesc> (description of any attribute value)
    • <valList> (a list of attribute value items)
    • <valItem> (a single attribute value item)

2. TEI ODD

The TEI uses the Documentation Elements to write the TEI Guidelines. From the Guidelnes themselves, the associated schemas and element reference documentation are generated.

2.1. Important ODD concepts

The TEI's literary programming with ODD (One Document Does it all) provides:
  • Schema specification
  • User oriented documentation
  • Modularity: all specifications pertaining to a coherent sub-domain of the TEI
  • Classes: identifying shared behaviours or semantics
  • Extensibility: a consequence of the above mechanisms

2.2. The TEI ODD in practice

The TEI Guidelines, its schema, and its schema fragments, are all produced from a single XML resource containing:
  1. Descriptive prose (lots of it)
  2. Examples of usage (plenty)
  3. Formal declarations for components of the TEI Abstract Model:
    • elements and attributes
    • modules
    • classes and macros

2.3. Possibilities of customizing the TEI

The TEI has over 20 modules. A working project will:
  • Choose the modules they need
  • Probably narrow the set of elements within each module
  • Probably add local datatype constraints
  • Possibly add new elements/attributes in other namespaces
  • Possibly localize the names of elements

2.4. Under the hood

TEI customizations are themselves expressed in TEI XML, using elements from the tagdocs module mentioned above.

For example:
<schemaSpec ident="myTEIlite">
 <desc>This is TEI Lite with simplified heads</desc>
 <moduleRef key="tei"/>
 <moduleRef key="core"/>
 <moduleRef key="textstructure"/>
 <moduleRef key="header"/>
 <moduleRef key="linking"/>
 <elementSpec ident="headmode="change">
  <content><rng:text/></content>
 </elementSpec>
</schemaSpec>

produces something like TEI Lite, with a slight change

2.5. ODD processors

  • The TEI maintains a library of XSLT scripts that can generate
    • The TEI Guidelines in canonical TEI XML format
    • The Guidelines in HTML or PDF
    • RELAXNG, DTD, or W3C schema fragments
  • The same library is used by the customization layer to generate
    • project-specific documentation
    • project-specific schemas
    • translations into other (human) languages
  • We use eXist as a database for extracting material from the P5 sources

2.6. The TEI abstract model

  • The TEI abstract model sees a markup scheme (a schema) as consisting of a number of discrete modules, which can be combined more or less as required.
  • A schema is made by combining references to modules and optional element over-rides or additions
  • Each element declares the module it belongs to: elements cannot appear in more than one module.
  • Each module extends the range of elements and attributes available by adding new members to existing classes of elements, or by defining new classes.

2.7. Expression of TEI content models

Within the class system, TEI elements have to be defined using some language notation; choices include:
  1. using XML DTD language (as in older versions of the TEI)
  2. using W3C Schema language
  3. using the RELAXNG schema language
  4. inventing an entirely new abstract language for later transformation to specific schema language
We chose a combination of 3 and 4 — using our abstract language, but switching to RELAXNG for content modelling.

2.8. Why that combination?

  • Expressing constraints in XML language is too attractive to forego
  • There is a clamour for better datatyping than DTDs have
  • The schema languages are so good, it is silly to reinvent them
  • But we like our class system and literate programming

2.9. DTD vs RELAXNG vs W3C Schema

  • DTDs are not XML, and need specialist software
  • W3C schema is not consistently implemented, its documentation is vast and confusing, and it looks over-complex
  • RELAXNG on the other hand…
    • uncluttered design
    • good documentation
    • multiple open source 100%-complete implementations
    • ISO standard
    • useful features for multipurpose structural validation
No contest…

2.10. An Example ODD

<elementSpec module="spokenident="pause">
 <classes>
  <memberOf key="model.divPart.spoken"/>
  <memberOf key="att.timed"/>
  <memberOf key="att.typed"/>
 </classes>
 <content> <rng:empty/>
 </content>
 <attList>
  <attDef ident="whousage="opt">
   <gloss>A unique identifier</gloss>
   <desc>supplies the identifier of the person or group pausing.
       Its value is the identifier of a <gi>person</gi> or <gi>persGrp</gi>
       element in the TEI header.</desc>
   <datatype><rng:ref name="data.pointer"/></datatype>
  </attDef>
 </attList>
 <desc>a pause either between or within utterances.</desc>
</elementSpec>

2.11. From which we generate: RNC

element pause { pause.content, pause.attributes }
pause.content = empty
pause.attributes =
att.global.attributes,
att.timed.attributes,
att.typed.attributes,
att.ascribed.attributes,
model.divPart.spoken |= pause
att.timed |= pause
att.typed |= pause
att.ascribed |= pause

2.12. Or DTD

<!ELEMENT %n.pause; %om.RR; EMPTY> <!ATTLIST %n.pause; %att.global.attributes; %att.timed.attributes; %att.typed.attributes; %att.ascribed.attributes;> <!ENTITY % model.divPart.spoken "%x.model.divPart.spoken; %n.event; | %n.kinesic; | %n.pause; | %n.shift; | %n.u; | %n.vocal; | %n.writing;">

2.13. Or documentation

2.14. Overriding an attribute value-list in a TEI ODD

<elementSpec ident="listmodule="core">
 <classes>
  <memberOf key="att.typed"/>
 </classes>
 <attDef ident="typemode="replace">
  <valList type="closed">
   <valItem ident="ordered">
    <gloss>Items are ordered</gloss>
   </valItem>
   <valItem ident="bulleted">
    <gloss>Items are bulleted</gloss>
   </valItem>
   <valItem ident="frabjous">
    <gloss>Items are frabjous</gloss>
   </valItem>
  </valList>
 </attDef>
</elementSpec>

2.15. Modifying TEI objects

Understanding classes is critical.
  • They group together elements with the same role in the TEI architecture
  • They group together elements with the same syntactic behaviour
  • Classes can provide attributes for groups of like-minded elements
  • The elements in the class will appear in the same content models
The class defines a group of elements belonging to the same family of concepts, elements declare themselves as belonging to a class.

2.16. Uniformity of description

  • modules, elements, attributes, value-lists are treated uniformly
  • each has an identifier, a gloss, a description, and one or more equivalents
  • each can be added, changed, replaced, deleted within a given context
  • for example, membership in the att.type class gives you a generic type attribute, which can be over-ridden for specific class members

3. Roma

But the TEI knows you don't want to necessarily have to write TEI ODDs in order to customize the TEI! So it has provided Roma, which is a command-line script, and corresponding web front-end to help you do this.

The people behind Roma are:
Arno Mittelbach
Initial programming
Sebastian Rahtz
Maintenance and frequent improvements
Ioan Bernevig
A 'Sanity Checker' addition
But it is available from TEI's Sourceforge SVN Repository, and I'm sure Sebastian would be happy for others to provide further enhancements.

3.1. How to use the TEI

Imagine that you have seen your colleague next door doing some encoding with the TEI and want to do the same thing:
  • Go to Roma at http://tei.oucs.ox.ac.uk/Roma/
  • Toy with the user profile [ Customize ]
  • Generate a schema [Schema]
  • Make a trial with the editor, creating a simple document
  • Get back to Roma and make basic documentation

3.2. Roma: New

3.3. Roma: Customize

3.4. Roma: Schema

3.5. Roma: Documentation

3.6. Subsetting the TEI

Suppose you now feel you want to use some more of the TEI, but not all of it

  • Go to Roma…
  • Look at [Modules]
  • Explore default modules by pointing to main elements (by order of interest). You can throw away most things, but
    • In textstructure, you should really keep <TEI>, <text>, <body> and <div>
    • In core, most people need <p>, <q>, <list>, <pb/> and <head>
    • From header, keep everything unless you really understand the details
  • Start checking out elements
  • Make editorial choices (numbered vs. unnumbered divs)

3.7. Roma: Modules

3.8. Roma: Change Module

3.9. Roma: Change Attributes

3.10. Roma: Change Attribute Values

3.11. Roma: Change Language

3.12. Roma: Sanity Checker

3.13. A word of caution

Remember
  • The TEI is not a monolithic environment
  • Very few things are really mandatory …
  • …but the TEI is more than just a market place
  • Basic document structure must be preserved
The TEI is a powerful environment for working with elements and producing documentation, but do not abuse it.

3.14. Next...?

Next James will lead us in an exercise in using Roma.



James Cummings. Date: 2007-10-31
Copyright University of Oxford