Index-Driven Digitization and Indexation of Historical Archives

The promise of digitization of historical archives lies in their indexation at the level of contents. Unfortunately, this kind of indexation does not scale with the speed of digitization if done manually. In this article we present a method to bootstrap the deployment of a content-based information system for digitized historical archives, relying on extant indexing tools. Such indexes were commonly prepared to search within homogeneous records when the archive was still current. We present a conceptual model to describe and manipulate historical indexing tools. We then introduce a systematic approach for their use in order to guide digitization campaigns and to index digitized historical records. Eventually, we exemplify the approach with a case study on the indexation system of the X Savi alle Decime in Rialto, a Venetian magistracy in charge for the exaction - and related record keeping - of a tax on real estate in early modern Venice.

[1]  Tom Evens,et al.  Challenges of digital preservation for cultural heritage institutions , 2011, J. Libr. Inf. Sci..

[2]  Simonetta Montemagni,et al.  Computational Analysis of Historical Documents : An Application to Italian War Bulletins in World War I and II , 2014 .

[3]  L. Putnam The Transnational and the Text-Searchable: Digitized Sources and the Shadows They Cast , 2016 .

[4]  Melissa Terras,et al.  Opening Access to collections: the making and using of open digitised cultural content , 2015, Online Inf. Rev..

[5]  Elizabeth E. Yale,et al.  The History of Archives: The State of the Discipline , 2015 .

[6]  Sara Tonelli,et al.  RAMBLE ON: Tracing Movements of Popular Historical Figures , 2017, EACL.

[7]  Karyn Meaden,et al.  Digital Futures: Strategies for the Information Age , 2002 .

[8]  Ian Milligan Illusionary Order: Online Databases, Optical Character Recognition, and Canadian History, 1997–2010 , 2013 .

[9]  Michel Duchein,et al.  The history of European archives and the development of the archival profession in Europe , 2009 .

[10]  Frank van Harmelen,et al.  Semantic technologies for historical research: A survey , 2014, Semantic Web.

[11]  The Book and the Archive in the History of Science , 2016, Isis.

[12]  Jody L. DeRidder,et al.  What Do Researchers Need? Feedback On Use of Online Primary Source Materials , 2014, D Lib Mag..

[13]  N. Popper From abbey to archive: managing texts and records in early modern England , 2010 .

[14]  Filippo De Vivo Ordering the archive in early modern Venice (1400–1650) , 2010 .

[15]  T. Hitchcock Confronting the Digital , 2013 .

[16]  Clifford A. Lynch Digital Collections, Digital Libraries and the Digitization of Cultural Heritage Information , 2002, First Monday.

[17]  Caroline Sporleder,et al.  Person-Centric Mining of Historical Newspaper Collections , 2016, TPDL.

[18]  Sergey Brin,et al.  Reprint of: The anatomy of a large-scale hypertextual web search engine , 2012, Comput. Networks.

[19]  D. Rosenberg Early modern information overload , 2003, IEEE Engineering Management Review.

[20]  Bart Ooghe,et al.  Analysing Selection for Digitisation: Current Practices and Common Incentives , 2009, D Lib Mag..

[21]  Brian Ogilvie,et al.  Scientific Archives in the Age of Digitization , 2016, Isis.

[22]  R. C. Head,et al.  Knowing Like a State: The Transformation of Political Knowledge in Swiss Archives, 1450–1770* , 2003, The Journal of Modern History.

[23]  David Thomas,et al.  Artificial Fibers—The Implications of the Digital for Archival Access , 2018, Front. Digit. Humanit..

[24]  Trilce Navarrete,et al.  Digitization of heritage collections as indicator of innovation , 2015 .

[25]  O. Boonstra,et al.  Past, present and future of historical information science , 2006 .

[26]  Melissa Terras The Rise of Digitization , 2011 .

[27]  James M. O'Toole,et al.  A Social History of Knowledge: From Gutenberg to Diderot , 2001 .

[28]  Geoffrey Yeo The conceptual fonds and the physical collection , 2012 .

[29]  Maarten Marx,et al.  Good Applications for Crummy Entity Linkers?: The Case of Corpus Selection in Digital Humanities , 2017, SEMANTICS.

[30]  Giovanni Colavizza,et al.  A Method for Record Linkage with Sparse Historical Data , 2016, DH.

[31]  Ralph Grishman,et al.  Information Extraction: Techniques and Challenges , 1997, SCIE.

[32]  Geoffrey Yeo,et al.  Archival description in the era of digital abundance , 2013 .

[33]  Clifford A. Lynch Digital Collections, Digital Libraries & the Digitization of Cultural Heritage Information , 2002 .

[34]  Alexandra Chassanoff Historians and the Use of Primary Source Materials in the Digital Age , 2013 .

[35]  Simone Paolo Ponzetto,et al.  Domain-specific Named Entity Disambiguation in Historical Memoirs , 2017, CLiC-it.

[36]  Max Evans,et al.  Archives of the People, by the People, for the People , 2008 .

[37]  R. Rikowski Digital Libraries and Digitisation: An Overview and Critique , 2008 .

[38]  A. Buchanan,et al.  Too Much to Know. Managing Scholarly Information before the Modern Age , 2010 .

[39]  Laurie Lopatin,et al.  Library digitization projects, issues and guidelines: A survey of the literature , 2006, Libr. Hi Tech.

[40]  Filippo de Vivo Coeur de l'etat, lieu de tension. le tournant archivistique vu de Venise (XVe-XVIIe siècle) , 2013 .

[41]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[42]  Jakub Piskorski,et al.  Information Extraction: Past, Present and Future , 2013, Multi-source, Multilingual Information Extraction and Summarization.

[43]  Melissa Terras,et al.  Crowdsourcing in the digital humanities , 2016 .

[44]  Michael Piotrowski,et al.  Natural Language Processing for Historical Texts , 2012, Synthesis Lectures on Human Language Technologies.