Text only | Skip links
Skip links||IT Services, University of Oxford

Contents

1. Representation of Primary Sources

This module is intended for use in the representation of primary sources, such as manuscripts or other written materials. Issues covered include:
  • digital facsimiles: including digital images in a TEI edition
  • altered, corrected, and erroneous texts
  • hands and responsibility
  • damage and conjecture
  • aspects of layout
  • headers, footers, and similar matter

2. Representation of Primary Sources

This module is intended for use in the representation of primary sources, such as manuscripts or other written materials. Issues covered include:
  • digital facsimiles: including digital images in a TEI edition
  • altered, corrected, and erroneous texts
  • hands and responsibility
  • damage and conjecture
  • aspects of layout
  • headers, footers, and similar matter

3. Digital Facsimiles

Including digital images in a TEI edition.
  • <facsimile> contains a representation of some written source in the form of a set of images rather than as transcribed or encoded text.
  • <surface> defines a written surface in terms of a rectangular coordinate space, optionally grouping one or more graphic representations of that space, and rectangular zones of interest within it.
    • @start points to an element which encodes the starting position of the text corresponding to the inscribed part of the surface.
  • <zone> defines a rectangular area contained within a <surface> element.
  • att.global.facs elements which can be associated with an image or a surface within a facsimile element.
    • @facs (facsimile) points directly to an image, or to a part of a facsimile element which corresponds with this element.

4. Simplest case: using @facs for 1:1 mapping

If a digital text contains one image per page or column (or similar unit), and no more complex mapping between text and image is envisaged, then the @facs attribute may be used to point directly to a graphic resource.
<TEI>
 <teiHeader>
<!--...-->
 </teiHeader>
 <text>
  <pb facs="LSH-416.pdfn="416"/>
  <head>THE LEGEND OF SLEEPY HOLLOW</head>
<!-- Page 416 continues -->
  <pb facs="LSH-417.pdfn="417"/>
  <lb n="1"/>of the quietest places in the whole world. A small brook
 
<!-- Page 417 continues -->
 </text>
</TEI>

5. Using @facs in conjunction with <facsimile>, <surface>, and <zone>

Using these attributes and elements together enables an editor to
  • associate multiple images with each page
  • record arbitrary planar coordinates of textual elements on any kind of written surface and link such elements to digital facsimile images of them

6. <facsimile>

The facsimile element is used to represent a digital facsimile. It appears within a TEI document along with, or instead of, the text element introduced in section 5 Default Text Structure. When this module is selected therefore, a legal TEI document may thus comprise any of the following:
  • a TEI Header and a text element
  • a TEI Header and a facsimile element
  • a TEI Header, a facsimile element, and a text element
<TEI>
 <teiHeader>
<!--...-->
 </teiHeader>
 <facsimile>
  <graphic url="LSH-416.pdf"/>
  <graphic url="LSH-417.pdf"/>
  <graphic url="LSH-418.pdf"/>
  <graphic url="LSH-419.pdf"/>
 </facsimile>
 <text>
<!-- -->
 </text>
</TEI>

7. <surface>

The <surface> element may be used to indicate that there are two image files corresponding with the same area of the work:
<facsimile>
 <surface>
  <graphic url="LSH-416.pdf"/>
  <graphic url="psnypl_berg_979.jpg"/>
 </surface>
 <graphic url="LSH-416.pdf"/>
 <graphic url="LSH-416.pdf"/>
</facsimile>

8. dimensions

The actual dimensions of the object represented are not documented by the surface element; instead, the surface is located within an abstract coordinate space, which is defined by the following attributes, supplied by the att.coordinated class:
  • @ulx gives the x coordinate value for the upper left corner of a rectangular space
  • @uly gives the y coordinate value for the upper left corner of a rectangular space.
  • @lrx gives the x coordinate value for the lower right corner of a rectangular space.
  • @lry gives the y coordinate value for the lower right corner of a rectangular space.

9. Example

<facsimile>
 <surface
   ulx="93"
   uly="681"
   lrx="967"
   lry="1568">

<!-- ... -->
 </surface>
</facsimile>

10. <zone> in <surface>

To describe the whole image, we will also need to define a zone of interest which represents an area larger than this surface. This zone of interest can be defined by a <zone> element, within which we can place the uncropped <graphic>:

<facsimile>
 <surface
   ulx="93"
   uly="681"
   lrx="967"
   lry="1568">

  <zone
    ulx="0"
    uly="0"
    lrx="993"
    lry="1639">

   <graphic url="psnypl_berg_979.jpg"/>
  </zone>
 </surface>
</facsimile>

11. <desc>

The <desc> element may also be used within either <surface> or <zone> to provide some further information about the area being defined. In this case, each surface must specify a bounding box which encloses the appropriate page, as well as defining the zone for the graphic itself:

12. <desc> Example

<facsimile>
 <surface
   ulx="96"
   uly="89"
   lrx="950"
   lry="657">

  <desc>front matter</desc>
  <zone
    ulx="0"
    uly="0"
    lrx="993"
    lry="1639">

   <graphic url="psnypl_berg_979.jpg"/>
  </zone>
 </surface>
 <surface
   ulx="93"
   uly="681"
   lrx="967"
   lry="1568">

  <desc>main text</desc>
  <zone
    ulx="0"
    uly="0"
    lrx="993"
    lry="1639">

   <graphic url="psnypl_berg_979.jpg"/>
  </zone>
 </surface>
</facsimile>

13. More uses for <zone>

In addition to acting as a container for <graphic> elements, <zone> elements may also be used to select parts of each surface for analytical purposes.

<facsimile>
 <surface
   ulx="93"
   uly="681"
   lrx="967"
   lry="1568">

  <desc>main text</desc>
  <zone
    ulx="0"
    uly="0"
    lrx="993"
    lry="1639">

   <graphic url="psnypl_berg_979.jpg"/>
  </zone>
  <zone
    ulx="507"
    uly="1109"
    lrx="707"
    lry="1163">

   <desc>supralinear addition</desc>
  </zone>
 </surface>
</facsimile>

14. Aligning the transcription with facsimile elements

  1. give each relevant part of the facsimile an identifier
  2. using the @facs attribute, point from the transcription into the <facsimile>
<facsimile>
 <surface
   ulx="96"
   uly="89"
   lrx="950"
   lry="657"
   xml:id="SH-front-facs">

  <desc>front matter</desc>
  <zone
    ulx="0"
    uly="0"
    lrx="993"
    lry="1639">

   <graphic url="psnypl_berg_979.jpg"/>
  </zone>
 </surface>
 <surface
   ulx="93"
   uly="681"
   lrx="967"
   lry="1568"
   xml:id="SH-pg1-facs">

  <desc>main text</desc>
  <zone
    ulx="0"
    uly="0"
    lrx="993"
    lry="1639">

   <graphic url="psnypl_berg_979.jpg"/>
  </zone>
  <zone
    ulx="507"
    uly="1109"
    lrx="707"
    lry="1163"
    xml:id="SH-add1-facs">

   <desc>supralinear addition</desc>
  </zone>
<!-- ... -->
 </surface>
<!-- ... -->
</facsimile>
<text>
 <front facs="#SH-front-facs">
  <div>
   <head>THE LEGEND OF SLEEPY HOLLOW.</head>
   <head>FOUND AMONG THE PAPERS OF THE LATE DIEDRICH KNICKERBOCKER </head>
   <lg>
    <l>A pleasing land of drowsy head it was,</l>
    <l> Of dreams that wave before the half-shut eye;</l>
    <l>And of gay castles in the clouds that pass,</l>
    <l> For ever flushing round a summer sky.</l>
    <trailer>CASTLE OF INDOLENCE.</trailer>
   </lg>
  </div>
 </front>
 <body>
  <div>
   <p>
    <pb facs="#SH-pg1-facs"/>IN the bosom of one of those spacious coves which indent
       the eastern shore of the Hudson, at that broad expansion of the river
       denominated by the ancient Dutch navigators the Tappan Zee, and where they
       always <add facs="#SH-add1-facs">prudently</add> shortened sail, and implored the
       protection of St. Nicholas when they crossed, there lies a
<!-- continues -->
   </p>
<!-- continued -->
  </div>
 </body>
</text>

15. Using @start to link from <facsimile> to transcription

It is also possible to point in the other direction, from a <surface> or <zone> to the corresponding text. This is the function of the @start attribute, which supplies the identifier of the element containing the transcribed text found within the <surface> or <zone> concerned.

<facsimile>
 <surface
   ulx="96"
   uly="89"
   lrx="950"
   lry="657"
   start="#SH-front">

  <desc>front matter</desc>
  <zone
    ulx="0"
    uly="0"
    lrx="993"
    lry="1639">

   <graphic url="psnypl_berg_979.jpg"/>
  </zone>
 </surface>
 <surface
   ulx="93"
   uly="681"
   lrx="967"
   lry="1568"
   start="#SH-pg1">

  <desc>main text</desc>
  <zone
    ulx="0"
    uly="0"
    lrx="993"
    lry="1639">

   <graphic url="psnypl_berg_979.jpg"/>
  </zone>
  <zone
    ulx="507"
    uly="1109"
    lrx="707"
    lry="1163"
    start="#SH-add1">

   <desc>supralinear addition</desc>
  </zone>
<!-- ... -->
 </surface>
<!-- ... -->
</facsimile>
<text>
 <front xml:id="SH-front1">
  <div>
   <head>THE LEGEND OF SLEEPY HOLLOW.</head>
   <head>FOUND AMONG THE PAPERS OF THE LATE DIEDRICH KNICKERBOCKER </head>
   <lg>
    <l>A pleasing land of drowsy head it was,</l>
    <l> Of dreams that wave before the half-shut eye;</l>
    <l>And of gay castles in the clouds that pass,</l>
    <l> For ever flushing round a summer sky.</l>
    <trailer>CASTLE OF INDOLENCE.</trailer>
   </lg>
  </div>
 </front>
 <body>
  <div>
   <p>
    <pb n="1xml:id="SH-pg1"/>IN the bosom of one of those spacious coves which
       indent the eastern shore of the Hudson, at that broad expansion of the river
       denominated by the ancient Dutch navigators the Tappan Zee, and where they
       always <add xml:id="SH-add1">prudently</add> shortened sail, and implored the
       protection of St. Nicholas when they crossed, there lies a
<!-- continues -->
   </p>
<!-- continued -->
  </div>
 </body>
</text>

16. Live Example

<facsimile>
 <surface
   xml:id="grave"
   ulx="0"
   uly="0"
   lrx="355"
   lry="678">

  <graphic url="gravestone-cropped.jpg"/>
  <zone
    ulx="83"
    uly="223"
    lrx="272"
    lry="256"
    xml:id="line1"/>

  <zone
    ulx="92"
    uly="251"
    lrx="256"
    lry="282"
    xml:id="line2"/>

  <zone
    ulx="21"
    uly="281"
    lrx="330"
    lry="308"
    xml:id="line3"/>

  <zone
    ulx="36"
    uly="306"
    lrx="320"
    lry="332"
    xml:id="line4"/>

  <zone
    ulx="85"
    uly="535"
    lrx="249"
    lry="556"
    xml:id="line5"/>

  <zone
    ulx="97"
    uly="556"
    lrx="241"
    lry="576"
    xml:id="line6"/>

  <zone
    ulx="58"
    uly="577"
    lrx="281"
    lry="595"
    xml:id="line7"/>

  <zone
    ulx="68"
    uly="595"
    lrx="271"
    lry="613"
    xml:id="line8"/>

 </surface>
</facsimile>
<text>
 <body>
  <div facs="#grave">
   <p>Private Moulds' gravestone</p>
   <div>
    <ab>
     <s facs="#line1">12851 PRIVATE</s>
     <lb/>
     <s facs="#line2">H. MOULDS</s>
     <lb/>
     <s facs="#line3">NORTHAMPTONSHIRE REGT.</s>
     <lb/>
     <s facs="#line4">23RD JULY 1916 AGED 21</s>
    </ab>
    <ab>
     <s facs="#line5">LOVING SON OF </s>
     <lb/>
     <s facs="#line6">MRS MOULDS</s>
     <lb/>
     <s facs="#line7">PETERBORO, ENGLAND</s>
     <lb/>
     <s facs="#line8">FOR EVER WITH US</s>
     <lb/>
    </ab>
   </div>
  </div>
 </body>
</text>

http://users.ox.ac.uk/~rahtz/test4.html

17. Other elements for transcriptional work

Defined in Core:

abbr add choice corr del expan gap sic

Defined in transcr:

addSpan am damage damageSpan delSpan ex facsimile fw handNotes handShift restore space subst supplied surface zone

18. Abbreviation and Expansion

A manuscript abbreviation may be viewed in two ways.
  • One may transcribe it as a particular sequence of letters or marks upon the page: thus, a ‘p with a bar through the descender’, a ‘superscript hook’, a ‘macron’
  • One may also interpret the abbreviation in terms of the letter or letters it is seen as standing for: thus, ‘per’, ‘re’, ‘n'

Both of these views are supported by these Guidelines.

19. Examples

eu<g ref="#er">er</g>y <g ref="#per">per</g>sone that loketh after heuen hath a place in this ladder
<abbr>eu<g ref="#er">er</g>y</abbr>
<abbr>
 <g ref="#per">per</g>sone
</abbr> ...
<expan>euery</expan>
<expan>persone</expan> ...
<choice>
 <abbr>eu<g ref="#er">er</g>y</abbr>
 <expan>euery</expan>
</choice>

20. New elements in TEI P5 1.0: <ex> and <am>

Using these elements, a transcriber may indicate the status of the individual letters or signs within both the abbreviation and the expansion.
  • <ex> (editorial expansion) contains a sequence of letters added by an editor or transcriber when expanding an abbreviation.
  • <am> (abbreviation marker) contains a sequence of letters or signs present in an abbreviation which are omitted or replaced in the expanded form of the abbreviation.
Previously, people have used existing elements such as <hi> and <supplied> to mark individual letters/signs in abbreviations and expansions. With <am> and <ex> the TEI seeks to build in support for this important aspect of transcription.

21. Abbreviation Examples

<abbr>eu<am>
  <g ref="#er"/>
 </am>y</abbr>
<abbr>
 <am>
  <g ref="#per"/>
 </am>sone
</abbr> ...
<expan>eu<ex>er</ex>y</expan>
<expan>
 <ex>per</ex>sone
</expan> ...
eu<choice>
 <am>
  <g ref="#er"/>
 </am>
 <ex>er</ex>
</choice>y <choice>
 <am>
  <g ref="#per"/>
 </am>
 <ex>per</ex>
</choice>sone ...

22. Correction and Conjecture

<sic>, <corr>, <choice> : Covered in Core

23. Correction/Conjecture Examples

Nos autem iam ostendimus quod nutrimentum
et <choice>
 <sic>angues</sic>
 <corr>augens</corr>
</choice>.

24. Additions and Deletions

  • <add> (addition) contains letters, words, or phrases inserted in the text by an author, scribe, annotator, or corrector.
  • <addSpan/> (added span of text) marks the beginning of a longer sequence of text added by an author, scribe, annotator or corrector (see also add).
  • <del> (deletion) contains a letter, word, or passage deleted, marked as deleted, or otherwise indicated as superfluous or spurious in the copy text by an author, scribe, annotator, or corrector.
  • <delSpan/> (deleted span of text) marks the beginning of a longer sequence of text deleted, marked as deleted, or otherwise signaled as superfluous or spurious by an author, scribe, annotator, or corrector.

25. <addSpan> and <delSpan>

These two elements delimit a span of text by pointing mechanisms rather than by enclosing it. This is useful if an addition or deletion overlaps another span of text.

@spanTo indicates the end of a span initiated by the element bearing this attribute.

<addSpan spanTo="#id4"/>
<!-- added text -->
<anchor xml:id="id4"/>

26. <add> and <del> Examples

by the ancient Dutch navigators <del rend="strikethroughhand="#WI">of these waters</del> the Tappaan Zee, and where they
always <add hand="#WIplace="supralinear">prudently</add> shortened sail ...
<handNote xml:id="WI">Washington Irving holograph</handNote>

27. Substitutions

<subst> (substitution) groups one or more deletions with one or more additions when the combination is to be regarded as a single intervention in the text. (Some have used <choice> in this manner in the past)
  • one word written over another
  • one word deleted, replaced by another written above it by the same hand at one time
  • one word deleted, replaced by a different hand some other time
  • a long chain of substitutions on the one stretch of text, with uncertainty as to the order of substitution and as to which of many possible readings should be preferred

28. <subst> Examples

<l>
 <delSpan rend="verticalStrikespanTo="#delend02"/> Tis moonlight <subst>
  <del>upon</del>
  <add>over</add>
 </subst> Oman's sky
</l>
<l>Her isles of pearl look lovelily<anchor xml:id="delend02"/>
</l>

By convention, deletion precedes addition. This may be over-ridden by means of the @seq attribute, which is of particular usefulness when a sequence of deletions and additions occurs.

One must have lived longer with <subst>
 <del seq="1">this</del>
 <del seq="2">
  <add seq="1">such a</add>
 </del>
 <add seq="2">a</add>
</subst> system, to appreciate its advantages.

29. More complex example

<l>And towards our distant rest began to trudge,</l>
<l>
 <subst>
  <del>Helping the worst amongst us</del>
  <add>Dragging the worst amongt us</add>
 </subst>, who'd no boots
</l>
<l>But limped on, blood-shod. All went lame; <subst>
  <del status="shortEnd">half-</del>
  <add>all</add>
 </subst> blind;</l>
<l>Drunk with fatigue ; deaf even to the hoots</l>
<l>Of tired, outstripped <del>fif</del> five-nines that dropped behind.</l>

30. Cancellation of Deletions and Other Markings

<restore> indicates restoration of text to an earlier state by cancellation of an editorial or authorial marking or instruction.

by the ancient Dutch navigators <restore hand="#WI2">
 <del rend="strikethroughhand="#WI2">of these waters</del>
</restore> the Tappaan Zee, and where they always <add hand="#WI2place="supralinear">prudently</add> shortened sail ...
<handNote xml:id="WI2">Washington Irving
holograph</handNote>

31. Text Omitted from or Supplied in the Transcription

  • <gap> indicates a point where material has been omitted in a transcription, whether for editorial reasons described in the TEI header, as part of sampling practice, or because the material is illegible or inaudible.
  • <supplied> signifies text supplied by the transcriber or editor for any reason, typically because the original cannot be read because of physical damage or loss to the original.

32. <gap> and <supplied> examples

expansion <gap reason="illegibleagent="water"/> river denominated
expansion <supplied reason="illegibleresp="#DPsource="#SH1862">of the</supplied> river denominated

33. <handNote> and <handShift>

The <handNote> element is used to provide information about each hand distinguished within the encoded document.

  • When the transcr module is used, the element <handNotes> is available, within the <profileDesc> element of the Header, to hold one or more <handNote> elements. (brief)
  • When the msdescription module is included, the <handDesc> element also becomes available as part of a structured manuscript description. (more robust)

It is possible to use the two elements together if, for example, the <handDesc> element contains a single summary describing all the hands discursively, while the <handNotes> element gives specific details of each.

34. <handShift>

<handShift> marks the beginning of a sequence of text written in a new hand, or the beginning of a scribal stint.

<l>When wolde the cat dwelle in his ynne</l>
<handShift medium="greenish-ink"/>
<l>And if the cattes skynne be slyk <handShift medium="black-ink"/> and gaye</l>
<handNotes>
 <handNote xml:id="h1script="copperplatemedium="brown-ink">Carefully written with
   regular descenders</handNote>
 <handNote xml:id="h2script="printmedium="pencil">Unschooled scrawl</handNote>
</handNotes>
<handShift new="#h1resp="#dascert="medium"/>... and that good Order Decency and
regular worship may be once more introduced and Established in this Parish according to
the Rules and Ceremonies of the Church of England and as under a good Consciencious and
sober Curate there would and ought to be <handShift new="#h2resp="#dascert=""/> and
for that purpose the parishioners pray

35. @hand, @resp, @cert

<add
  place="supralinear"
  resp="#FB"
  hand="#WJ"
  cert="medium">
But</add>
<choice>
 <sic>One</sic>
 <corr resp="#FBcert="high">one</corr>
</choice> must have lived ...
<!-- elsewhere -->
<respStmt xml:id="FB">
 <resp>editorial changes</resp>
 <name>Fredson Bowers</name>
</respStmt>
<respStmt xml:id="WJ">
 <resp>authorial changes</resp>
 <name>William James</name>
</respStmt>

36. Damage and Illegibility

Use <damage> if the text can be read with perfect confidence

<p>
<!-- ... -->
 <pb n="5r"/>
 <damageSpan agent="rubbingextent="whole leafspanTo="#damageEnd"/>
</p>
<p> .... </p>
<p> .... <pb n="5vxml:id="damageEnd"/>
</p>

37. Disjoint Damage

IN the bosom <damage group="1">o</damage>f one of those spa<lb n="2"/>cious coves wh<damage group="1">ich
inde</damage>nt the eastern <lb n="3"/>shore of the <damage group="1">Hudson, at
</damage>that broad <lb n="4"/>expansion <damage group="1">of the r</damage>iver
denominated <lb n="5"/>by the ancie<damage>nt</damage> Dutch navigators

38. Damage and Illegibility

Use <unclear> if the text has been rendered partly illegible by deletion or damage so that the text can be read but without perfect confidence

Use the @reason attribute to state the cause (damage, deletion, etc.) of the uncertainty in transcription and the @cert attribute to indicate the confidence in the transcription.

shore of the <unclear reason="damagecert="medium">the Hudson, at</unclear> that broad

39. Missing and Supplied text

Where the transcriber considers that one or more words have been erroneously omitted in the original source and corrects this omission, the <supplied> element should be used in preference to <corr>.

by the ancient Dutch navigators
<supplied>of</supplied> the Tappan Zee

40. Next...?

Next James will tell us about Documentation Elements, ODD and Roma.



Dot Porter. Date: 2007-10-31
Copyright University of Oxford