This paper presents work in progress on the DTA “Base Format” for Manuscripts (DTABf-M), an extension to the DTA “Base Format” (DTABf) for the TEI-conformant annotation of manuscripts. The DTABf is a TEI-subset for the consistent, yet unambiguous, annotation of large amounts of historical text. During our work on the DTA corpora, the DTABf has continuously been subject to further adaptations to specific annotation needs. The latest addition, the DTABf-M, contains elements, attributes, and values necessary for the annotation of (historical) handwritten documents. The goal is to provide a TEI format for diverse manuscripts in large text corpora. While the DTABf covers a wide range of phenomena found not only in printed texts but also in manuscripts, there are certain manuscript-specific features which have to be additionally represented by the DTABf-M. There are several prerequisites for DTABf-M to be suitable for the DTA and its workflows and processes: First, it should be based on the original DTABf tagset, and only extend it if unavoidable. Second, like the DTABf, the DTABf-M should be created in a bottom-up approach, that is, based on actual phenomena found in handwritten texts which are transcribed and encoded using the DTABf. Third, the format should complement the DTABf, not replace it. Hence, it is necessary to find a modular way of integrating the DTABf-M into the DTABf. This paper describes how we deal with these issues in the process of developing the DTABf-M.
[1]
C. M. Sperberg-McQueen,et al.
Guidelines for electronic text encoding and interchange
,
1994
.
[2]
Alexander Geyken,et al.
The DTA 'base format': A TEI-subset for the compilation of interoperable corpora
,
2012,
KONVENS.
[3]
Alexander Geyken,et al.
The DTA “Base Format”: A TEI Subset for the Compilation of a Large Reference Corpus of Printed Text from Multiple Sources
,
2014
.
[4]
Alexander Geyken,et al.
Measuring the Correctness of Double-Keying: Error Classification and Quality Control in a Large Corpus of TEI-Annotated Historical Text
,
2013
.
[5]
A. Uslar Pietri.
Alexander Von Humboldt
,
1873,
Nature.
[6]
Khalid Choukri,et al.
The european language resources association
,
1998,
LREC.