Extracting scientific articles from a large digital archive: BioStor and the Biodiversity Heritage Library

BackgroundThe Biodiversity Heritage Library (BHL) is a large digital archive of legacy biological literature, comprising over 31 million pages scanned from books, monographs, and journals. During the digitisation process basic metadata about the scanned items is recorded, but not article-level metadata. Given that the article is the standard unit of citation, this makes it difficult to locate cited literature in BHL. Adding the ability to easily find articles in BHL would greatly enhance the value of the archive.DescriptionA service was developed to locate articles in BHL based on matching article metadata to BHL metadata using approximate string matching, regular expressions, and string alignment. This article locating service is exposed as a standard OpenURL resolver on the BioStor web site http://biostor.org/openurl/. This resolver can be used on the web, or called by bibliographic tools that support OpenURL.ConclusionsBioStor provides tools for extracting, annotating, and visualising articles from the Biodiversity Heritage Library. BioStor is available from http://biostor.org/.

[1]  Magnus Lidén The legacy of Linnaeus , 2007, Nature.

[2]  W Michaelsen,et al.  Neue und wenig bekannte Oligochäten aus skandinavischen Sammlungen , 1921 .

[3]  E. J. V. Nieukerken,et al.  Tijdschrift voor Entomologie 150 volumes: one and a half century of systematic entomology in a changing world , 2007 .

[4]  Herbert Van de Sompel,et al.  Open Linking in the Scholarly Information Environment Using the OpenURL Framework , 2001, D Lib Mag..

[5]  John D. Lynch,et al.  The identities of the Colombian frogs confused with Eleutherodactylus latidiscus (Boulenger) (Amphibia: Anura: Leptodactylidae) , 1994 .

[6]  E. Holt.,et al.  I.—Preliminary notice of the Schizopoda collected by H.M.S. ‘Discovery’ in the Antarctic Region , 1906 .

[7]  Qin Wei,et al.  Name Matters : Taxonomic Name Recognition ( TNR ) in Biodiversity Heritage Library ( BHL ) , 2010 .

[8]  botanical libraries,et al.  Biodiversity Heritage Library , 2009 .

[9]  Manuel Blum,et al.  reCAPTCHA: Human-Based Character Recognition via Web Security Measures , 2008, Science.

[10]  Achille P. Raselimanana,et al.  A revision of the dwarf Zonosaurus Boulenger (Reptilia: Squamata: Cordylidae) from Madagascar : , 2000 .

[11]  松田 直人 『Google Scholar』の利点 , 2009 .

[12]  R. Page Wikipedia as an encyclopaedia of life , 2010, Organisms Diversity & Evolution.

[13]  Charles P Alexander The crane-flies collected by the Swedish expedition (1895-1896) to southern Chile and Tierra del Fuego (Tipulidae, Diptera) , 1920 .

[14]  Olivier Lambert,et al.  The giant bite of a new raptorial sperm whale from the Miocene epoch of Peru , 2010, Nature.

[15]  L. B. Holthuis The Scientific Name of the Sperm Whale , 1987 .

[16]  DESCRIPTION OF THE MISSOURIUM , OR MISSOURI LEVIATHAN , 2009 .

[17]  B. Kahle THE INTERNET ARCHIVE , 2012 .

[18]  Amy Maxmen,et al.  Fighting the monster , 2010, Nature.

[19]  R. I. Pocock LII.—On the Arachnida taken in the Transvaal and in Nyasaland by Mr. W. L. Distant and Dr. Percy Rendall , 1898 .

[20]  Dror G. Feitelson,et al.  On identifying name equivalences in digital libraries , 2004, Inf. Res..

[21]  William E. Schevill THE INTERNATIONAL CODE OF ZOOLOGICAL NOMENCLATURE AND A PARADIGM: THE NAME PHYSETER CATODON LINNAEUS 1758 , 1986 .

[22]  W. Ride International code of zoological nomenclature = Code international de nomenclature zoologique , 1985 .

[23]  C. Lee Giles,et al.  Digital Libraries and Autonomous Citation Indexing , 1999, Computer.

[24]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[25]  Martin R. Kalfatovic,et al.  The Biodiversity Heritage Library: Advancing Metadata Practices in a Collaborative Digital Library , 2010 .

[26]  Victor Henning,et al.  Mendeley - A Last.fm For Research? , 2008, 2008 IEEE Fourth International Conference on eScience.

[27]  Albert C. Koch Description of Missourium, or Missouri leviathan : together with its supposed habits and Indian traditions concerning the location from whence it was exhumed; also, comparisons of the whale, crocodile and missourium with the leviathan, as described in by Albert Koch. , 1841 .

[28]  William E. Schevill Mr. Schevill replies , 1987 .

[29]  Neal L. Evenhuis,et al.  Publication and dating of the journals forming the Annals and Magazine of Natural History and the Journal of Natural History , 2003 .

[30]  James Ze Wang,et al.  A metadata generation system for scanned scientific volumes , 2008, JCDL '08.

[31]  Roderic D. M. Page,et al.  bioGUID: resolving, discovering, and minting identifiers for biodiversity informatics , 2009, BMC Bioinformatics.

[32]  Sandip Debnath,et al.  Learning metadata from the evidence in an on-line citation matching scheme , 2006, Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '06).