论文信息 - Specifying a TEI-XML Based Format for Aligning Text to Image at Character Level

Specifying a TEI-XML Based Format for Aligning Text to Image at Character Level

This papers presents an experience of specifying and implementing an XML format for text to image alignment at word and character level within the TEI framework. The format in question is a supplementary markup layer applied to heterogeneous transcriptions of medieval Latin and French manuscripts encoded using different " flavors " of the TEI (normalized for critical editions, diplomatic or palaeographic transcriptions). One of the problems that had to be solved was identifying " non-alignable " spans in various kinds of transcriptions. Originally designed in the framework of a research project on the ontology of letter-forms in medieval Latin and vernacular (mostly French) manuscripts and inscriptions, this format can be of use for all kinds of projects that involve fine-grain alignment of transcriptions with zones on digital images.

Alexei Lavrentiev | Yann Leydier | Dominique Stutzmann

[1] Dominique Stutzmann. Ontologie des formes et encodage des textes manuscrits médiévaux. Le projet ORIFLAMMS , 2013, Document Numérique.

[2] Florence Codine. Polices de caractères et inscriptions monétaires. Le projet PIM , 2013, Document Numérique.

[3] William Bright,et al. The Blackwell encyclopedia of writing systems By Florian Coulmas (review) , 2015 .

[4] Véronique Eglin,et al. Learning-Free Text-Image Alignment for Medieval Manuscripts , 2014, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[5] O. Guyotjeannin,et al. Conseils pour l'édition des textes médiévaux , 2009 .

[6] Dominique Stuzmann,et al. Paléographie statistique pour décrire, identifier, dater... Normaliser pour coopérer et aller plus loin ? , 2011 .