Towards an electronic variorum edition originating from available-quality document facsimiles

A variorum edition of an early modern literary text—an edition containing multiple copies examined by multiple editors, textual variants, and commentaries by critics—provides a valuable scholarly resource to humanities researchers involved in textual criticism and bibliographical studies. Organizing variorum editions of texts in electronic form makes those texts more manageable, accessible, and flexible. The key issues encountered during the process of creating electronic variorum editions include how to automatically recognize document images with poor quality, how to sustain relationships among related data entities, how to edit documents with multi-variant contents, and how to customize variorum editions based on available data entities and readers' preferences. Addressing these issues requires solutions from various areas of computer science such as optical character recognition, computer-human interaction, text-based computing, and relational database management. With Don Quixote de la Mancha by Miguel de Cervantes Saavedra as textual model, this work presents an environment for creating and presenting electronic variorum editions originating from document facsimiles stored in microfilms. All data entities in the environment, as well as relationships among them, are sustained through the process of creating electronic variorum editions. Sustained relationships make the creation of hypertextual variorum editions more efficient, and the variorum editions more flexible and comprehensive. Within the proposed environment, a mechanism for synchronizing texts and their facsimile images, and another mechanism for synchronizing corresponding contexts in multiple editions are presented. The mechanisms facilitate the comparison of multi-edition, dual-form documents, which is the most critical task in the process of creating variorum editions. A Web-based interface is devised to allow readers to customize unified editions of texts based on their intended application and preference. The evaluation of this work has been conducted by professors and graduate students from fields in the Humanities and Computer Science. The evaluation results show that the prototype system meets the expected requirements, although there is still room for further research. More broadly, we believe that the concepts, working procedures, and software techniques developed will be helpful in handling issues of mid-to-large scale digitization of ancient documents for a wide variety of collections.