Text only | Skip links
Skip links||IT Services, University of Oxford

1. Basic structure(s)

  • Every TEI-conformant document comprises a header followed by (at least one) text
  • the header contains:
    • mandatory file description
    • optional encoding, profile and revision descriptions
  • the header is essential for:
    • bibliographic control and identification
    • resource documentation and processing

2. Structure of a TEI text

  • In the simplest case, a text just consists of paragraphs or clauses or verse lines
  • A TEI text has a little more structure: it contains
    • optional front matter
    • optional back matter
    • a body
  • We'll see some more complex cases later.

3. The body of a text usually has divisions

  • usually nested one with another
  • the type attribute labels a particular level e.g. as "part" or "chapter"
  • the n attribute gives a particular division a name or number
  • the xml:id attribute gives a particular division a unique identifier

4. For example...

<!-- titlepage, etc here -->
  <div type="bookn="Ixml:id="JA0100">
   <head>Book I.</head>
   <div type="chaptern="1xml:id="JA0101">
    <head>Of writing lives in general...</head>
<!-- remainder of chapter 1 here -->
   <div n="2xml:id="JA0102">
<!-- chapter 2 here -->
<!-- remainder of book 1 here -->
  <div type="bookn="IIxml:id="JA0200">
<!-- book 2 here -->
<!-- remaining books here -->

NB. divisions always tesselate.

5. TEI global attributes

  • The attribute class att.global defines these for all elements:
    • xml:id supplies a unique identifier
    • n supplies a (non-unique) name or number
    • rend gives a suggestion about rendition (appearance)
    • xml:lang identifies the language using an ISO standard code
  • The linking module extends this class with:
    • corresp, synch, ana for specific association types
    • next, prev for aggregating fragmented elements

6. Text components

What are divisions composed of?
  • prose is mostly paragraphs (<p>)
  • verse is mostly lines (<l>), sometimes in hierarchic groups (<lg>)
  • drama is mostly speeches (<sp>) containing <p> or <l> elements interspersed with stage directions (<stage>)
These may be mixed, and may also appear directly within undivided texts.

... but divisions can also contain embedded <text> or <quote> elements.

7. For example

<div type="book">
 <l>Of Man's first disobedience, and the fruit</l>
 <l>Of that forbidden tree whose mortal taste</l>
 <l>Brought death into the World, and all our woe,</l>
 <l>With loss of Eden...</l>

<lg type="haiku">
 <l n="1">Summer grass —</l>
 <l n="2">all that's left</l>
 <l n="3">of warriors' dreams</l>

8. For example

<stage>Enter Barnardo and Francisco,
two Sentinels,at several doors</stage>
<sp who="Barnardo">
 <l part="f">Who's there? </l>
<sp who="Francisco">
 <l>Nay, answer me. Stand and unfold yourself.</l>
<sp who="Barnardo">
 <l part="i">Long live the king! </l>
<sp who="Francisco">
 <l part="m">Barnardo? </l>
<sp who="Barnardo">
 <l part="f">He. </l>

9. not to mention

<p>.... And he wrote on one side
of the paper:
  <p>PIGLIT (ME)</p>
and on the other side:
<quote>IT'S ME PIGLIT, HELP HELP</quote>
<p>Then he put the paper in the bottle...

10. What are speeches, paragraphs, and lines made of?

  • phrases that are conventionally typographically distinct
  • “data-like” (names, numbers, dates, times, addresses)
  • editorial interventions (corrections, regularizations, additions, omissions ...)
  • cross references and links
  • lists, notes, graphics, tables, bibliographic citations...
  • all kinds of annotations!

Which of these you need to markup will depend on your research agenda

11. for example...

<head>Of writing lives in general,and particularly of
<title>Pamela </title>, with a word by the bye of
<name key="#CIBC03">Colley Cibber</name> and others.</head>
<p>It is a trite but true observation, that
<q>examples work more forcibly on the mind
   than precepts</q>
 <name key="#JA">Mr. Joseph Andrews</name>,
<rs>the hero of our ensuing history</rs>, was
esteemed to be ...

12. Direct speech

  • Use the who attribute to show speakers
  • Speeches can be nested in other speeches
<q who="Wilson"> Spaulding, he came down into
the office just this day eight weeks with
this very paper in his hand, and he
says: <q who="Spaulding">I wish to
   the Lord, Mr. Wilson, that I was a
   red-headed man.</q>

13. Foreign language phrases

  • The xml:lang attribute may be attached to any element
  • Use <foreign> if nothing else is available
  • Use ISO 639-2 code to identify language
Have you read <title xml:lang="deu">Die Dreigroschenoper</title>?
<mentioned xml:lang="fra">Savoir-faire </mentioned>
is French for know-how.
John has real <foreign xml:lang="fra">savoir-faire </foreign>.

14. Correction and Regularization

  • <corr> marks a correction
  • <sic> marks a (deliberate) non-correction
  • <reg> marks a regularization
  • <orig> marks something deliberately un-normalized
  • Use <choice> to indicate a combination of possible encodings

15. For his nose was as sharp as a pen and a table of green feelds

... and <reg>he</reg>
<corr resp="#Theobald">babbl'd</corr> ...
... and
 <corr resp="#Theobald">babbl'd</corr>
of green

16. ‘Inter’ class elements

  • <list> lists of all kinds
  • <note> notes (authorial or editorial)
  • <figure> pictures or figures
  • <table> tables
  • <bibl> bibliographic descriptions

17. Lists

  • use <list> for lists of any kind (use type attribute to distinguish)
  • use <label> in two-column lists as alternative to n attribute
  • may be nested as necessary

18. for example...

<list type="xmas">
 <label>For my true love</label>
  <list type="bullets">
   <item>three calling birds></item>
   <item>two french hens</item>
   <item>a partridge in a pear tree</item>
 <label>For Uncle Joe</label>
 <item>socks as usual</item>

19. Figures and graphics

The presence of a graphic is indicated by the <graphic> element, usually contained within a <figure> element which groups together:
  • The title of the graphic (<head>)
  • A description of the graphic (<figDesc>) for use by software unable to render the graphic
  • The graphic resource itself is pointed to by an url attribute on the <graphic> element, and may also have attributes scale, height, width
  • Alternatively, it may be directly embedded within a <binaryObject> element
  • <figure>s may self-nest, and may also contain other display class items such as <formula>s

20. Example

 <head>Mr Fezziwig's Ball</head>
 <figDesc>A Cruikshank engraving showing
   Mr Fezziwig leading a
   group of revellers.</figDesc>
 <graphic url="fezz.gif"/>

21. Tables

  • a <table> element contains <row>s of <cell>s
  • spanning is indicated by rows and cols attributes
  • role attribute indicates whether row or column holds data or a label
  • embedded tables are permitted

22. for example...

A three column table
Row1123 4567
Row2abc defgh
  <cell cols="3role="label">A three column table</cell>
  <cell role="label">Row1</cell>
  <cell role="label">Row2</cell>

23. Bibliography

  • The <listBibl> element lists bibliographic citations
  • Individual citations may be represented loosely as <bibl> elements, or in a more structured way as <biblStruct> elements
  • In either case, elements from the model.biblPart class are used, e.g.
    • <author>, <editor>, (generic) <respStmt> etc.
    • <title> with optional level attribute to distinguish monographic, analytic etc.
    • <imprint> groups publication info (publisher, date etc.)
    • <biblScope> adds page references etc.
  • Individual citations may be linked to in the usual way

24. Example

<p>See for example <ref target="#REG92">Regis (1992)</ref>...

<div rend="slide">
  <bibl xml:id="REG92">
   <author>Ed Regis</author>
   <title level="m">Great Mambo Chicken and
       the Trans-Human Experience</title>
   <pubPlace>London </pubPlace>
   <publisher>Penguin Books</publisher>
   <biblScope>pp 144 ff</biblScope>

25. Notes

  • Use <note> for notes of any kind (editorial or authorial)
  • if in-line, use place attribute to specify location
  • if out of line, either use
    • target attribute to specify attachment point
    • or mark attachment point as a <ref>

26. for example...

 <l>The self-same moment I could pray></l>
 <l>And from my neck so free</l>
 <l>The albatross fell off, and sank</l>
 <l>Like lead into the sea.
 <note type="authplace="margin">
     The spell begins to break.</note>


<l>The albatross fell off, and sank</l>
 <l xml:id="L213">Like lead into the sea. </l>

<note type="authplace="margintarget="#L213">
The spell begins to break.</note>

27. Feeling overwhelmed?

All of this is just one way of looking at the TEI.

  1. The TEI is a modular system: you use it to build an encoding scheme appropriate to your needs, by selecting specific modules
  2. Each module defines a group of elements and attributes
  3. Elements are classified structurally and semantically

Define your goals before using the TEI!

28. Some other modules

Your choice from:
  1. Transcription of spoken texts
  2. Dictionaries and lexica
  3. Varieties of linguistic annotation
  4. Nonstandard characters and glyphs
  5. Linking, alignment, non-hierarchic structures
  6. Detailed metadata (the TEI Header)
  7. Manuscript Description
  8. Text-critical apparatus
  9. Physical description
  10. Onomastics and ontologies
  11. The ODD system

29. Exploring TEI P5

Feedback and advice available to all on tei-l@listserv.brown.edu

Date: Feb 2007
Copyright University of Oxford