Classics in the Million Book Library

In October 2008, Google announced a settlement that will provide access to seven million scanned books while the number of books freely available under an open license from the Internet Archive exceeded one million. The collections and services that classicists have created over the past generation place them in a strategic position to exploit the potential of these collections. This paper concludes with research topics relevant to all humanists on converting page images to text, one language to another, and raw text into machine actionable data.

[1]  Lorcan Dempsey,et al.  Anatomy of Aggregate Collections: The Example of Google Print for Libraries , 2005, D Lib Mag..

[2]  Bill N. Schilit,et al.  Exploring a digital library through key ideas , 2008, JCDL '08.

[3]  Gerald W. Bracey Basic Information , 2009 .

[4]  Gregory R. Crane,et al.  Disambiguating Geographic Names in a Historical Digital Library , 2001, ECDL.

[5]  Jean-Yves Ramel,et al.  User-driven page layout analysis of historical printed books , 2007, International Journal of Document Analysis and Recognition (IJDAR).

[6]  Nicole Vincent,et al.  Document image analysis for active reading , 2007, SADPI.

[7]  Andrew McCallum,et al.  Mining a digital library for influential authors , 2007, JCDL '07.

[8]  Martin Doerr,et al.  The dream of a global knowledge network—A new approach , 2008, JOCCH.

[9]  W. Brent Seales,et al.  Guided linking: efficiently making image-to-transcript correspondence , 2001, JCDL '01.

[10]  Fiona M. Douglas The Scottish Corpus of Texts and Speech: Problems of Corpus Design , 2003, Lit. Linguistic Comput..

[11]  Gregory R. Crane,et al.  What Do You Do with a Million Books? , 2006, D Lib Mag..

[12]  Gregory R. Crane,et al.  A new generation of textual corpora: mining corpora from very large collections , 2007, JCDL '07.

[13]  C. Lee Giles,et al.  CiteSeer: an autonomous Web agent for automatic retrieval and identification of interesting publications , 1998, AGENTS '98.

[14]  Giovanni Soda,et al.  Exploring Digital Libraries with Document Image Retrieval , 2007, ECDL.

[15]  Martin Doerr,et al.  Issues in an inference platform for generating deductive knowledge: a case study in cultural heritage digital libraries using the CIDOC CRM , 2008, International Journal on Digital Libraries.

[16]  Martin Reynaert,et al.  Non-interactive OCR Post-correction for Giga-Scale Digitization Projects , 2008, CICLing.

[17]  Douglas Biber,et al.  Representativeness in corpus design , 1993 .

[18]  James Ze Wang,et al.  Intelligent Parsing of Scanned Volumes for Web Based Archives , 2007, International Conference on Semantic Computing (ICSC 2007).