Text only | Skip links
Skip links||IT Services, University of Oxford

1. Some useful TEI modules

  • Analysis
  • Linking
  • Tables
  • Graphics

2. Analysis

  • associating simple analyses and interpretations with text elements
  • semantic or syntactic interpretations which an encoder wishes to attach to all or part of a text
  • mainly covering linguistic information
  • as often in the TEI, you can do the same thing in many ways:
    • using generic <seg> elements with type attributes
    • using the straightforward canned analyses described here
    • using the more powerful and general TEI Feature Structures

3. Linguistic units

To mark up text for linguistic purposes:
(s-unit) contains a sentence-like division of a text.
(clause) represents a grammatical clause.
(phrase) represents a grammatical phrase.
(word) represents a grammatical (not necessarily orthographic) word.
(morpheme) represents a grammatical morpheme.
(character) represents a character.
From the att.segLike class, these elements all have type and function attributes

4. Example of linguistic markup

<u>Like a suck of one of my sweets?</u>
<u>No I don't take sweets from strangers, oh God</u>
<u who="PS1K5">
 <s n="5963">
  <w type="AV0">Like </w>
  <w type="AT0">a </w>
  <w type="NN1">suck </w>
  <w type="PRF">of </w>
  <w type="CRD">one </w>
  <w type="PRF">of </w>
  <w type="DPS">my </w>
  <w type="NN2">sweets</w>
<u trans="smoothwho="PS1BY">
 <s n="5964">
  <w type="ITJ">No </w>
  <w type="PNP">I </w>
  <w type="VDB">do</w>
  <w type="XX0">n't </w>
  <w type="VVI">take </w>
  <w type="NN2">sweets </w>
  <w type="PRP">from </w>
  <w type="NN2">strangers</w>
  <c type="PUN">, </c>
  <w type="ITJ">oh </w>
  <w type="NP0">God</w>
(from British National Corpus, KSV 5963)

5. Mixing analysis with structure

Analytic units often cross structural boundaries. The <cl> (clause) elements here cross the verse lines (<l>). We can use the part attribute to show how a <cl> can be assembled:
<div type="stanza">
  <cl part="I">Tweedledum and Tweedledee</cl>
  <cl part="F">Agreed to have a battle;</cl>
  <cl part="I">For Tweedledum said Tweedledee</cl>
  <cl part="F">Had spoiled his nice new rattle.</cl>
An alternative is to use the next attribute:
 <cl next="#c2xml:id="c1">For Tweedledum said
 <cl prev="#c1xml:id="c2part="I">Tweedledee</cl>
 <cl prev="#c3xml:id="c4part="F">
  <cl prev="#c4xml:id="c5part="F">Had spoiled his nice new
(as usual, there are other ways in the TEI to deal with multiple hierarchies)

6. Stand-off interpretation

When inline markup is inappropriate, the <span> element can be used to make ad hoc remarks about bits of text, linked to by ID. As usual, <spanGrp> is available to group assertions together.

 <ab xml:id="eye_start">Lest it see more, prevent it.
   Out, vile jelly!</ab>
 <ab>Where is thy lustre now?</ab>
 <ab>All dark and comfortless. Where's my son Edmund?</ab>
 <ab>Edmund, enkindle all the sparks of nature,</ab>
 <ab xml:id="eye_end">To quit this horrid act.</ab>
<span from="#eye_startto="#eye_end">the eye is pulled out</span>

7. Stand-off interpretation (cont)

The <interp> element is used to encode an interpretation. The global ana attribute can point from the text to such an interpretation:
 <ab ana="#blind">Lest it see more, prevent it.
   Out, vile jelly!</ab>
 <ab>Where is thy lustre now?</ab>
 <ab>All dark and comfortless. Where's my son Edmund?</ab>
 <ab>Edmund, enkindle all the sparks of nature,</ab>
 <ab>To quit this horrid act.</ab>
<interp resp="#SPQRxml:id="blind">removal of eyes</interp>

The <interpGrp> element is used to group interpretations together.

8. More complex interpretation example

In this example:
  • A set of possible interpretations is defined, using <interp> elements
  • <seg> is used to markup distinct portions of a narrative
  • <s> is used to mark sentences
  • <anchor> is used to mark milestones
  • the ana attribute links sections or milestones to appropriate interpretation
<p xml:id="PP1">
 <interpGrp resp="#TMAtype="structuralUnit">
  <interp xml:id="INTRO">introduction</interp>
  <interp xml:id="CONFLICT">conflict</interp>
  <interp xml:id="CLIMAX">climax</interp>
  <interp xml:id="REVENGE">revenge</interp>
  <interp xml:id="RECONCIL">reconciliation</interp>
  <interp xml:id="AFTERM">aftermath</interp>
 <seg xml:id="SS1-SS3ana="#INTRO">
  <s xml:id="SS1">Sigmund ... was a king in Frankish country.</s>
  <s xml:id="SS2">Sinfiotli was the eldest of his sons.</s>
  <s xml:id="SS3">Borghild, Sigmund's wife, had a brother ... </s>
 <s xml:id="SS4Aana="#CONFLICT">But Sinfiotli ... wooed the same woman</s>
 <s xml:id="SS4Bana="#I3">and Sinfiotli killed him over it.</s>
 <seg xml:id="SS5-SS17ana="#CLIMAX">
  <s xml:id="SS5">And when he came home, ... she was obliged to accept it.</s>
  <s xml:id="SS6">At the funeral feast Borghild was serving beer.</s>
  <s xml:id="SS17">Sinfiotli drank it off and at once fell dead.</s>
<anchor xml:id="NIL1ana="#RECONCIL"/>
<p xml:id="PP2">Sigmund carried him a long way in his arms ... </p>
<p xml:id="PP3">King Sigmund lived a long time in Denmark ... </p>
<p xml:id="PP4">Sigmund and all his sons were tall ... </p>

9. Linking, segmentation and alignment

In some texts we need to be able
  • to link disparate elements without using the xml:id attribute;
  • to segment text into elements and to mark arbitrary points within documents
  • to represent correspondence or alignment among groups of text elements
  • to synchronize elements of a text, representing temporal correspondences and alignments among text elements
  • to specify that one text element is identical to or a copy of another
  • to aggregate possibly noncontinguous elements
  • to specify that different elements are alternatives to one another and to express preferences among the alternatives
  • to store markup separately from the the data it describes

10. Linking: underlying assumptions

  • Use W3C identifying, pointing and linking mechanisms where possible
  • Use xml:id to identify an element directly
  • Use XPointer to point to elements that do not have an xml:id

11. Complex pointing

The standard URI scheme allows for pointing
  • to documents other than the current document
  • to a particular element in a document other than the current document using its xml:id;
but we also need to point
  • to a particular element using its position in the XML element tree (standard XPointer schemes)
  • at arbitrary content in any XML document using TEI-defined XPointer schemes

12. Some XPointer schemes

From http://www.w3.org/2005/04/xpointer-schemes/; ones marked with a ➠ were specified by the TEI itself:
Identify elements by position within parent, recursively.
➠ left
Locates the point immediately preceding its argument. The sole argument is a pointer, which is treated as if it were a fragment identifier itself. The argument may return a node, node set, range, or point.
Takes as arguments a pointer, a string, and an optional integer. Designates the result of a literal match of the argument string within the string-value of the pointer argument.
Locates a range between two points in an XML information set. Takes two pointer arguments which locate the boundaries of the range by two points, and are interpreted as fragment identifiers.
Locates the point immediately following its argument. The sole argument is a pointer, which is treated as if it were a fragment identifier itself.
Locates a range based on character positions. Takes three arguments: a pointer, an offset, and a length.
Bind a prefix for use in subsequent pointer parts e.g. xmlns(xs=http://www.w3.org/2001/XMLSchema)
Locates a node or node set within an XML Information Set. The single argument is an XPath path as defined in the W3C XPath 1 Recommendation.
Locates a node or node set within an XML Information Set. The single argument is an XPath path as defined in the W3C XPath 2 Recommendation.
The rich scheme including XPaths and ranges described in the XPtr-xpointer Working draft

13. Test document for XPointer schemes

<!-- seven divs here -->
   <div xml:id="lastterm">
     <emph>'But'</emph>, said
    <name key="Stalky">Stalky</name>,
         ‘come to think of it, we've done more giddy
         jesting with the Sixth since we've been
         passed over than any one else in the last
         seven years.’</p>

14. Examples for XPointer schemes

<ptr target="stalky.xml#element(lastterm)"/>
<ptr target="stalky.xml#element(1/1/8)"/>
xpointer() and xmlns()
<ptr target="stalky.xml#xmlns(t=http://www.tei-c.org/ns/1.0)

Note that the last expression returns multiple nodes.

15. A daily use for XPointer

The W3C XInclude specification is a good way to write composite documents; the <include> element's href attribute allows for XPointers:
<div xmlns:xi="http://www.w3.org/2001/XInclude">
<xi:include href="stalky.xml#

16. Generic linking

The core TEI <ptr> and <ref> elements let you do the point to point linking we are used to on web pages, relying on XML IDs for internal links:
<p>Wikipedia has a a good starter page on
waving cats</ref>, with links to more esoteric
resources; our own pictures are in
section <ref target="#cats">3</ref>
The linking module adds <link> to let you specify a point to point relationship between two or more elements:
<p xml:id="beetle1">You're a despondin' brute,
<p xml:id="beetle2">An' who the dooce is this
Raymond Martin, M.P.?’ demanded Beetle</p>
<link targets="#beetle1 #beetle2"/>
Note that this is establishing a connection, not a direction.

17. Groups of links

<linkGrp> is provided to group together sets of <link>s. In the following example, it allows for stand-off notes, and characterisation of those notes:
<l xml:id="l2.79">A place there is, betwixt earth, air
and seas</l>
<l xml:id="l2.80">Where from Ambrosia, Jove retires
for ease.</l>
<l xml:id="l2.88">Sign'd with that Ichor which from Gods distills.</l>
<note xml:id="n2.79">
 <bibl>Ovid Met. 12.</bibl>
 <quote xml:lang="la">
  <l>Orbe locus media est, inter terrasq; fretumq;</l>
  <l>Cœlestesq; plagas —</l>
<note xml:id="n2.88">Alludes to <bibl>Homer, Iliad 5</bibl>
<linkGrp type="imitationnotes">
 <link targets="#n2.79 #l2.79"/>
 <link targets="#n2.88 #l2.88"/>

18. Segmenting text, and marking arbitrary points within documents

This module adds three useful new elements:
marks an block of text with no special semantic interpretation
marks a range of text with no special semantic interpretation
marks an arbitrary point in the text
The first two have helpful type and subtype attributes.

19. Marking points

<anchor> is comparable to HTML anchors:
<p>He was merely working up to a peroration, and the
boys knew it; but McTurk cut through the frothing
sentence, the others echoing:</p>
<p><anchor xml:id="MTu"/>I appeal to the Head, sir.’</p>
<p><anchor xml:id="Be"/>I appeal to the head, sir.’</p>
<p><anchor xml:id="St"/>I appeal to the Head, sir.’</p>
<p>It was their unquestioned right. Drunkenness meant
expulsion after a public flogging. They had been
accused of it. The case was the Head's, and the
Head's alone.</p>

20. Anonymous blocks

In this inscription, there are separate lines, but they are not poetry, or paragraphs, so we isolate them with <ab>:
 <ab>JOSEPH STORY</ab>
 <ab>ONLY SON OF</ab>
 <ab>BORN MAY 3rd 1847</ab>
 <ab>AT BOSTON U.S.A</ab>
 <ab>DIED NOV. 23rd 1853</ab>
 <ab>AT ROME</ab>

21. Segments

There are more specific elements elsewhere in the TEI for marking sentences, words and characters, but sometimes we need to mark an arbitrary span, using <seg>:
<q>Don't say <q>
  <seg type="stutter">I-I-I</seg>'m afraid,</q>
Melvin, just say <q>I'm afraid.</q>

22. Correspondence and alignment

First, consider the representation of a manuscript page:

<ab xml:id="N6">
 <lb/>and hat hire
don in obedience ðe cnoweð hire manere
<lb/>and hire strencðe. he mai ðe vttre
riwle chaungen efter <lb/>wisdom alse he
isihð te inre mai beon best iholden.
<anchor xml:id="N_6"/>
 <lb/>Non ancre bi
mine rede ne schal makien professiun.
<lb/>þet is. bihoten ase hest. bute
þreo þinges. þet beoð. o-<lb/>bedience.
chastete. and studestaþeluestnesse.
þet heo ne schal <lb/>þene stude neuer
more chaungen; bute vor neod one.
<lb/>alse strengðe and deaþes dred.
obedience of hire bischope; <lb/>
oþer of hire herre. vor whoa se
nimeð þing an hond and bi-<lb/>hat
hit god alse heste to donne.
heo bint hire þerto. and su-
<lb/>negeð deadliche i ðe bruche;
3if heo hit brekeð willes and
wol<lb/>des. …

23. Correspondence and alignment (cont.)

Now lets look at an edited version and a translation:

<p xml:id="edited_6">Nan ancre bi mi read ne schal
makien professiun—þet is, bihaten
ase heast—bute þreo þinges,
þet beoð obedience, chastete, ant
stude-steaðeluestnesse (þet ha ne
schal þet stude neauer mare
changin bute for nede ane,
as strengðe ant deaðes dred,
obedience of hire bischop oðer of
his herre). For hwa-se
nimeð þing on hond ant bihat hit
Godd as heast forte don hit, ha
bint hire þer-to, ant
sunegeð deadliche i þe bruche 3ef
ha hit brekeð willes.</p>
<p xml:id="translated_6">My advice
is that no anchoress should make
profession—that is, bind herself to
a vow—of more than three things,
which are obedience, chastity, and
stability of abode (that she should
never move elsewhere afterwards
unless it is absolutely necessary,
as in the case of violence and fear
of death, or obedience to her
bishop or his superior). For
whoever undertakes something and
promises God to carry it out as a
vow binds herself to it, and
commits a mortal sin if she
voluntarily breaks her vow. …</p>

24. Correspondence and alignment (cont.)

We can express a relationship between the texts as follows:
<linkGrp type="translations">
 <link targets="#edited_6 #translated_6"/>

<linkGrp type="editions">
 <link targets="#N-f2r #N6"/>

meaning ‘this paragraph in the translated edition corresponds to text at that anchor in the original’.

There are many other ways of dealing with material like this!

25. Synchronizing time-based material

If you are linking together sequences which are aligned by time, there is a special stand-off linking element <when>, grouped inside a <timeline>. It has attributes:
an absolute time for the event
the length of the gap since the last event
the unit of time in which the interval value is expressed
a link to the previous event
(note the interval/unit pair may be merged before TEI P5 is complete).
<timeline xml:id="tl1origin="#w0unit="ms">
 <when xml:id="w0absolute="11:30:00"/>
 <when xml:id="w1interval="unknownsince="#w0"/>
 <when xml:id="w2interval="100since="#w1"/>
 <when xml:id="w3interval="200since="#w2"/>
 <when xml:id="w4interval="150since="#w3"/>
 <when xml:id="w5interval="250since="#w4"/>
 <when xml:id="w6interval="100since="#w5"/>

These when objects can be used in a <link> to relate time events to points in the text.

26. Aggregating non-contiguous elements

The <join> element is used like <link>, pointing to 2 or more identified fragments of text. It claims that they could be joined to create a new virtual element (the result attribute). <joinGroup> is provided to aggregate <join>s.
  <seg xml:id="L1">E</seg>lizabeth it is in vain you say</l>
 <l>"<seg xml:id="L2">L</seg>ove not" — thou sayest it in …</l>
  <seg xml:id="L3">I</seg>n vain those words from thee o…</l>
  <seg xml:id="L4">Z</seg>antippe's talents had enforced…</l>
  <seg xml:id="L5">A</seg>h! if that language from thy h…</l>
  <seg xml:id="L6">B</seg>reath it less gently forth — a…</l>
  <seg xml:id="L7">E</seg>ndymion, recollect, when Luna …</l>
  <seg xml:id="L8">T</seg>o cure his love — was cured of…</l>
  <seg xml:id="L9">H</seg>is follie — pride — and passio…</l>
   targets="#L1 #L2 #L3 #L4 #L5 #L6 #L7 #L8 #L9result="name">

  <desc>The beloved's name</desc>
(from Edgar Allan Poe).

27. Elements as alternatives to one another

The <alt> element is used to indicate that two elements are mutually exclusive. <altGroup> is provided to aggregate <alt>s.

Example: the first time we transcribed this text, we saw
but on another look it says
Can this be a genuine change since our first visit? or just a mistake? Let's keep both:
<ab xml:id="W1">WILLILAM W. AND EMELYN STORY</ab>
<ab xml:id="W2">WILLIAM W. AND EMELYN STORY</ab>
<alt mode="excltargets="#W1 #W2"/>
weights and mode attributes can assign weight to the judgement, and allow for relationships other than mutually-exclusive.

28. Another way to express alternation

The global exclude attribute can be used by any element to indicate another element to which it is allergic:
<ab exclude="#W4xml:id="W3">WILLILAM W. AND EMELYN STORY</ab>
<ab exclude="#W3xml:id="W4">WILLIAM W. AND EMELYN STORY</ab>

29. Conclusions on linking

The linking module provides a wide range of tools to let you describe relationships between parts of your text. If you need these, remember:
  • You should work out a naming scheme to assign ID attributes. You will need a lot of them
  • There are often several ways to do things; use the more specialized markup when you can to make it easier for others to read. Don't rely on type attributes with undefined meanings everywhere
  • Control your vocabulary for token attributes like type
  • The TEI only takes you as far as markup. Implementing all this to make a fancy interactive text exploration web site may be a lot of work.

30. The TEI table model

  • simple <table>, <row> and <cell> with optional <head>
  • rows and cells can have a span
  • rows and cells have role attribute to distinguish data from labels
  • No allowance for:
    • column width specification
    • cell alignment
    • cell borders, vertical or horizontal rules
    although many of these can be specified using the general rend attribute (which can then, for example, be passed to HTML's class attribute for rendition by CSS).

31. Simple table

What Due date Who Done?
OWL registrations Sep 13 Sebastian yes
Registration Sep 18 Everett or Bev nearly
Catering layout Sep 18/19/20 Everett, Bev, Judy of course
 <row role="label">
  <cell>Due date</cell>
  <cell>OWL registrations</cell>
  <cell>Sep 13</cell>
  <cell>Sep 18</cell>
  <cell>Everett or Bev</cell>
  <cell>Catering layout</cell>
  <cell>Sep 18/19/20</cell>
  <cell>Everett, Bev, Judy</cell>
  <cell>of course</cell>

32. Specify table appearance usng rend

Add a rend attribute to cells (in the extreme case, add in ID to every cell!), and use this in your display processing:

  <cell rend="yes">cats</cell>
  <cell rend="no">dogs</cell>
  <cell rend="yes">apples</cell>
  <cell rend="no">oranges</cell>

33. Rendering a table in HTML

Make HTML where the rend is simply turned to class:

  <style> td.yes {
     color: red;
     border: solid green 2px;
     padding-left: 10px;
     padding-right: 10px;
     td.no {
     border-bottom: solid red 1pt;
     text-align: center;
     padding: 20px;
     width: 30px;
    <td class="yes">cats</td>
    <td class="no">dogs</td>
    <td class="yes">apples</td>
    <td class="no">oranges</td>

34. TEI figure model

We distinguish between:
  1. <figure>: the container for a picture or set of pictures; this may have a caption (using <head>) and an editorial description (<figDesc>)
  2. <graphic>: a link to an external picture. This can occur inside a <figure> or in an inline context
<graphic> has attributes for scale, width and height, to allow resizing of external images.

35. Inline graphics

The menu icons in the top bar () are used to launch, respectively, Thunderbird, Acrobat and oXygen.

<p>The menu icons in the top bar (<graphic url="icons.png"/>)
are used to launch, respectively, Thunderbird,
Acrobat and oXygen.</p>

36. An illustration

The picture shows a simple plaque with no decoration, and an
            incised inscription

Molly Cotton was the doyenne of archaeology in Rome for many years.

Figure 1. The grave of a famous archaeologist
 <head>The grave of a famous archaeologist</head>
 <figDesc>The picture shows a simple plaque with no
   decoration, and an incised inscription</figDesc>
 <graphic url="stone.pngwidth="3in"/>
 <p>Molly Cotton was the doyenne of archaeology
   in Rome for many years.</p>

37. Multiple elements to a figure

 <head>The grave of a famous archaeologist</head>
  <graphic url="stone.png"/>
  <head>Detail of incised cross</head>
  <graphic url="cross-on-stone.png"/>

38. Embedded graphics using SVG

If the SVG schema is included in your TEI customization, you can add <svg> to the content model of <figure>
Figure 2. Where in the world?

39. Source of embedded graphic example

 <head>Where in the world?</head>

40. Linking page images of text

How do you create a digital text combining marked-up transcription with pictures of (eg) manuscript pages? Suggestions:
  • attach ad hoc information to the <pb> element, and interpret it entirely in the processing application
  • extend <pb> to have a more explicit link to an image
  • associate pages (defined using <pb> or some other linking markup) with graphic images in the header
  • describe the combination using another metadata scheme such as METS
or join in the SIG on the TEI wiki at http://www.tei-c.org.uk/wiki/index.php/FacsimileMarkup

41. Math

TEI does very little with mathematics, simply providing a <formula> element to contain markup in some external scheme (such as TeX):
<formula notation="tex">\sqrt{e=mc^2}</formula>
but with an adjustment in your schema, embedded MathML would also be an option:
<p   xmlns:m="http://www.w3.org/1998/Math/MathML"> …the
behaviour of <formula notation="MathML">
  <m:math overflow="scroll">
 </formula> as a function of the layer thickness for an
electron of 100 keV and 1 GeV of kinetic
energy in Argon, Silicon and Uranium.

Sebastian Rahtz. Date: February 2007
Copyright University of Oxford