Abstract In 1996 Smithsonian Libraries (SIL) embarked on the digitization of its collections. By 1999, a full-scale digitization center was in place and rare volumes from the natural history collections, often of high illustrative value, were the focus for the first years of the program. The resulting beautiful books made available for online display were successful to a certain extent, but it soon became clear that the data locked within the texts needed to be converted to more usable and re-purposable form via digitization methods that went beyond simple page imaging and included text conversion elements. Library staff met with researchers from the taxonomic community to understand their path to the literature and identified tools (indexes and bibliographies) used to connect to the library holdings. The traditional library metadata describing the titles, which made them easily retrievable from the shelves of libraries, was not meeting the needs of the researcher looking for more detailed and granular data within the texts. The result was to identify proper print tools that could potential assist researchers in digital form. This paper outlines the project undertaken to convert Charles Davies Sherborn’s Index Animalium into a tool to connect researchers to the library holdings: from a print index to a database to eventually a dataset. Sherborn’s microcitation of a species name and his bibliographies help bridge the gap between taxonomist and literature holdings of libraries. In 2004, SIL received funding from the Smithsonian’s Atherton Seidell Endowment to create an online version of Sherborn’s Index Animalium. The initial project was to digitize the page images and re-key the data into a simple data structure. As the project evolved, a more complex database was developed which enabled quality field searching to retrieve species names and to search the bibliography. Problems with inconsistent abbreviations and styling of his bibliographies made the parsing of the data difficult. Coinciding with the development of the Biodiversity Heritage Library (BHL) in 2005, it became obvious there was a need to integrate the database converted Index Animalium, BHL’s scanned taxonomic literature, and taxonomic intelligence (the algorithmic identification of binomial, Latinate name-strings). The challenges of working with legacy taxonomic citation, computer matching algorithms, and making connections have brought us to today’s goal of making Sherborn available and linked to other datasets. Partnering with others to allow machine-to-machine communications the data is being examined for possible transformation into RDF markup and meeting the standards of Linked Open Data. SIL staff have partnered with Thomson Reuters and the Global Names Initiative to further enhance the Index Animalium data set. Thomson Reuters’ staff is now working on integrating the species microcitation and species name in the ION : Index to Organism Names project; Richard Pyle (The Bishop Museum) is also working on further parsing of the text. The Index Animalium collaborative project’s ultimate goal is to successful have researchers go seamlessly from the species name in either ION or the scanned pages of Index Animalium to the digitized original description in BHL - connecting taxonomic researchers to original authored species descriptions with just a click.
[1]
Jamie McKenzie,et al.
Libraries of the Future
,
1996
.
[2]
Edward C. Dickinson,et al.
Reinforcing the foundations of ornithological nomenclature: Filling the gaps in Sherborn’s and Richmond’s historical legacy of bibliographic exploration
,
2016,
ZooKeys.
[3]
Ellinor Michel,et al.
Anchoring Biodiversity Information: From Sherborn to the 21st century and beyond
,
2016,
ZooKeys.
[4]
Roderic D. M. Page.
Surfacing the deep data of taxonomy
,
2016,
ZooKeys.
[5]
Karolyn Shindler.
A magpie with a card-index mind – Charles Davies Sherborn 1861–1942
,
2016,
ZooKeys.
[6]
Roderic D. M. Page.
Taxonomic names, metadata, and the Semantic Web
,
2006
.
[7]
Christopher H C Lyal.
Digitising legacy zoological taxonomic literature: Processes, products and using the output
,
2016,
ZooKeys.
[8]
Charles Davies Sherborn.
Index animalium : sive, index nominum quae ab A. D. MDCCLVIII generibus et speciebus animalium imposita sunt societatibus eruditorum adiuvantibus /
,
1902
.
[9]
Catherine N. Norton,et al.
Taxonomic Informatics Tools for the Electronic Nomenclator Zoologicus
,
2006,
The Biological Bulletin.
[10]
Martin R. Kalfatovic,et al.
The Biodiversity Heritage Library: Advancing Metadata Practices in a Collaborative Digital Library
,
2010
.
[11]
F. A. Stafleu,et al.
Taxonomic literature : a selective guide to botanical publications and collections with dates, commentaries and types
,
1976
.
[12]
David Remsen,et al.
The use and limits of scientific names in biological informatics
,
2016,
ZooKeys.
[13]
T. Gary Gautier,et al.
National Museum of Natural History, Smithsonian Institution
,
1986
.
[14]
Neal L. Evenhuis.
Charles Davies Sherborn and the “Indexer’s Club”
,
2016,
ZooKeys.
[15]
F. Christian Thompson,et al.
Sherborn’s influence on Systema Dipterorum
,
2016,
ZooKeys.
[16]
Roderic D. M. Page,et al.
Biodiversity informatics: the challenge of linking data and the role of shared identifiers
,
2008,
Briefings Bioinform..
[17]
Miguel A. Alonso-Zarazaga,et al.
Manual for proposing a Part of the List of Available Names (LAN) in Zoology
,
2016,
ZooKeys.
[18]
Francisco Welter-Schultes,et al.
Sherborn’s Index Animalium: New names, systematic errors and availability of names in the light of modern nomenclature
,
2016,
ZooKeys.
[19]
Ellen B. Wells,et al.
Rare books and special collections in the Smithsonian Institution Libraries
,
1995
.
[20]
Ellinor Michel,et al.
The List of Available Names (LAN): A new generation for stable taxonomic names in zoology?
,
2016,
ZooKeys.
[21]
Christopher H. C. Lyal,et al.
INOTAXA — INtegrated Open TAXonomic Access and the " Biologia Centrali-Americana "
,
2004
.
[22]
Elvin R. King.
Physical Fitness: A Way of Life
,
1980
.