The inadequacy of embedded markup for cultural heritage texts

Embedded generalized markup, as applied by digital humanists to the recording and studying of our textual cultural heritage, suffers from a number of serious technical drawbacks. As a result of its evolution from early printer control languages, generalized markup can only express a document’s ‘logical’ structure via a repertoire of permissible printed format structures. In addition to the well-researched overlap problem, the embedding of markup codes into texts that never had them when written leads to a number of further difficulties: the inclusion of potentially obsolescent technical and subjective information into texts that are supposed to be archivable for the long term, the manual encoding of information that could be better computed automatically, and the obscuring of the text by highly complex technical data. Many of these problems can be alleviated by asserting a separation between the versions of which many cultural heritage texts are composed, and their content. In this way the complex inter-connections between versions can be handled automatically, leaving only simple markup for individual versions to be handled by the user.

[1]  Dino Buzzetti,et al.  Digital Representation and the Text Model , 2002 .

[2]  D. Sculley,et al.  Beyond Digital Incunabula: Modeling the Next Generation of Digital Libraries , 2006, ECDL.

[3]  Daniel Paul O'Donnell,et al.  Caedmon's hymn : a multimedia study, edition and archive , 2005 .

[4]  Julia Flanders,et al.  Some Problems of TEI Markup and Early Printed Books , 1997, Comput. Humanit..

[5]  Julien Bourdaillet,et al.  Alignement textuel monolingue avec recherche de déplacements : algorithmique pour la critique génétique , 2007 .

[6]  C. M. Sperberg-McQueen,et al.  Extensible markup language , 1997 .

[7]  Jeffrey D. Ullman,et al.  Formal languages and their relation to automata , 1969, Addison-Wesley series in computer science and information processing.

[8]  Desmond Allan Schmidt,et al.  A data structure for representing multi-version texts online , 2009, Int. J. Hum. Comput. Stud..

[9]  David T. Barnard,et al.  SGML-based markup for literary texts: Two problems and some solutions , 1988, Comput. Humanit..

[10]  John Unsworth,et al.  Digital Humanities 2007 Conference Abstracts, Second Edition , 2007 .

[11]  Richard J. Finneran,et al.  The Literary Text in the Digital Age , 1996 .

[12]  Donald H. Kraft,et al.  Proceedings of the twelfth international conference on Information and knowledge management , 2003, CIKM 2003.

[13]  Sanjay Jain,et al.  Editors' Introduction , 2005, ALT.

[14]  Sylvie Calabretto,et al.  Methodology for the construction of multi-structured documents , 2009 .

[15]  Noam Chomsky,et al.  On Certain Formal Properties of Grammars , 1959, Inf. Control..

[16]  David G. Durand,et al.  Refining our Notion of What Text Really Is: The Problem of Overlapping Hierarchies , 1993 .

[17]  Sam Wilmott The Dichotomy of Markup Languages , 2002, Extreme Markup Languages®.

[18]  Edward Vanhoutte Limitations and Possibilities of Text-Encoding for Electronic Editions , 2004 .

[19]  W. McCarty Humanities Computing: Essential Problems, Experimental Practice , 2002, Lit. Linguistic Comput..

[20]  Frederick P. Brooks,et al.  No Silver Bullet: Essence and Accidents of Software Engineering , 1987 .

[21]  New Directions in Critical Editing , 1997 .

[22]  Elena Pierazzo Digital genetic editions: the encoding of time in manuscript transcription , 2009 .

[23]  Steven J. DeRose,et al.  Markup Overlap: A Review and a Horse , 2004, Extreme Markup Languages®.

[24]  Wilhelm Ott A text processing system for the preparation of critical editions , 1979 .

[25]  David Smith,et al.  Textual Variation and Version Control in the TEI , 1999, Comput. Humanit..

[26]  Peter L. Shillingsburg,et al.  Scholarly Editing in the Computer Age: Theory and Practice , 1999 .

[27]  Angelo Di Iorio,et al.  Towards markup support for full GODDAGs and beyond: the EARMARK approach , 2009 .

[28]  XML parsing: a threat to database performance , 2003, CIKM '03.

[29]  Peter Robinson,et al.  Where We Are with Electronic Scholarly Editions, and Where We Want to Be , 2004 .

[30]  G. G. Meyer,et al.  Lecture notes in business information processing , 2009 .

[31]  Mats Dahlström Drowning by Versions , 2000 .

[32]  Dom J. Froger La critique des textes et son automatisation , 1968 .

[33]  Vincent Neyt Fretful Tags Amid the Verbiage: Issues in the Representation of Modern Manuscript Material , 2006 .

[34]  C. M. Sperberg-McQueen,et al.  Guidelines for electronic text encoding and interchange , 1994 .

[35]  PETER ROBINSON,et al.  Publishing an Electronic Textual Edition: The Case of The Wife of Bath's Prologue on CD-ROM , 1998, Comput. Humanit..

[36]  John Steven,et al.  What Should Markup Really Be? Applying theories of text to the design of markup systems , 1996 .

[37]  James Joyce,et al.  Ulysses: A Critical and Synoptic Edition , 1922 .

[38]  Protima Banerjee,et al.  Book Review: About Face 2.0: The essentials of interaction design , 2004, Inf. Vis..

[39]  Penny Gilbert Automatic collation: A technique for medieval texts , 1973 .

[40]  Angelo Di Iorio,et al.  A Natural and Multi-layered Approach to Detect Changes in Tree-Based Textual Documents , 2009, ICEIS.

[41]  Robert Stevens,et al.  The Manchester OWL Syntax , 2006, OWLED.

[42]  Manish Parashar,et al.  Latency Performance of SOAP Implementations , 2002, 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID'02).

[43]  Lara Vetter,et al.  Witnessing Dickinson's Witnesses , 2003, Lit. Linguistic Comput..

[44]  Michael A. Hiltzik,et al.  Dealers of lightning : Xerox PARC and the dawn of the computer age , 1999 .

[45]  John Price-Wilkin,et al.  Oxford English Dictionary (2nd ed.) , 1991 .

[46]  C. M. Sperberg-McQueen,et al.  Extensible Markup Language (XML) , 1997, World Wide Web J..

[47]  Bernard Cerquiglini,et al.  Eloge de la variante : histoire critique de la philologie , 1989 .

[48]  Charles F. Goldfarb SGML: The Reason Why and the First Published Hint , 1997, J. Am. Soc. Inf. Sci..

[49]  Domenico Fiormonte,et al.  "A Multi-Version Wiki" , 2008 .

[50]  A. Church An Unsolvable Problem of Elementary Number Theory , 1936 .

[51]  Allen H. Renear Out of Praxis: Three (Meta)Theories of Textuality , 1997 .

[52]  Jerzy W. Jaromczyk,et al.  Support for XML markup of image-based electronic editions , 2006, International Journal on Digital Libraries.

[53]  Paul Eggert,et al.  Text-encoding, Theories of the Text, and the 'Work-Site' , 2005, Lit. Linguistic Comput..

[54]  H. W. Winger : The Gutenberg Galaxy: The Making of Typographic Man , 1963 .

[55]  C. M. Sperberg-McQueen,et al.  Text in the Electronic Age: Texual Study and Textual Study and Text Encoding, with Examples from Medieval Texts , 1991 .

[56]  Stefanie Dipper,et al.  XML-based Stand-off Representation and Exploitation of Multi-Level Linguistic Annotation , 2005, Berliner XML Tage.

[57]  Joseph Bédier,et al.  La tradition manuscrite du Lai de l'Ombre. Réflexions sur l'art d'éditer les anciens textes (deuxième article) , 1928 .

[58]  Charles F. Goldfarb,et al.  SGML handbook , 1990 .

[59]  Lambert Schomaker,et al.  Proceedings of Digital Humanities , 2012 .

[60]  Desmond Schmidt Merging Multi-Version Texts: a General Solution to the Overlap Problem , 2009 .

[61]  Bernard Cerquiglini Eloge de la variante , 1983 .

[62]  Mary Jo Henning,et al.  The Electronic Text , 1990 .

[63]  Steven J. DeRose,et al.  Markup systems and the future of scholarly text processing , 1987, CACM.

[64]  Domenico Fiormonte,et al.  Scrittura e filologia nell'era digitale , 2003 .

[65]  Patricia R. Bart Experimental markup in a TEI-conformant setting , 2006 .

[66]  C. M. Sperberg-McQueen,et al.  eXtensible Markup Language (XML) 1.0 (Second Edition) , 2000 .

[67]  Claus Huitfeldt,et al.  Multi-dimensional texts in a one-dimensional medium , 1994, Comput. Humanit..

[68]  E. Eisenstein The Printing Revolution in Early Modern Europe , 1984 .

[69]  Jane Hunter,et al.  LORE: A compound object authoring and publishing tool for literary scholars , 2009 .

[70]  William Proctor Williams,et al.  Electronic Textual Editing , 2007 .

[71]  D. O. Cepraga Continental and Mediterranean Review: Filologia del futuro!: L'ECDOTICA NELL'UNIVERSO DIGITALE: Fiormonte, Domenico. 2003.Scrittura e filologia nell'era digitale. Torino: Bollati Boringhieri. , 2006 .

[72]  Alan Cooper,et al.  About Face 3: the essentials of interaction design , 1995 .

[73]  Martin D. Davis,et al.  Computability and Unsolvability , 1959, McGraw-Hill Series in Information Processing and Computers.