Automatic Extraction of Reference Linking Information from Online Documents

The Web, with its explosive growth, is becoming an efficient resource for up-to-date information for the scientific researcher. Informal online archives are repositories for technical reports. Proceedings are more and more commonly published on the Web. The collection of online journals is growing. Indeed, a good number of online journals are "born digital". Many researchers simply put their papers up on their own web site. The large volume of online material makes it quite desirable to be able to access cited documents immediately from the citing paper. Implementing this direct access is called "reference linking". Some reference linking services exist today. A number of commercial publishers, recognizing the significant value-added nature of reference linking, have banded together to form the CrossRef organization. The CrossRef publishers share their metadata, which enables them to interlink their journals. This metadata is not, however, available without a fee to organizations or individuals outside of CrossRef. The vast majority of online scholarly literature is accompanied by little or no metadata. Since it is desirable to link up this literature as well, the problem of automatically reference linking online scholarly literature in the absence of metadata and author intervention is a problem very much worth considering. This paper explores this problem in detail, and presents some algorithms for extracting metadata from online texts and linking full-text documents together. The extent to which reference linking of the online literature can be done automatically is therefore the main topic of this paper.

[1]  Herbert Van de Sompel,et al.  The Santa Fe Convention of the Open Archives Initiative , 2000, D Lib Mag..

[2]  Carl Lagoze,et al.  Reference Linking the Web''s Scholarly Papers , 2001 .

[3]  Robert Wilensky,et al.  Multivalent documents , 2000, CACM.

[4]  Herbert Van de Sompel,et al.  Reference Linking in a Hybrid Library Environment, Part 2: SFX, a Generic Linking Solution , 1999, D Lib Mag..

[5]  Dominic A. Orchard,et al.  XML Linking Language (XLink) Version 1. 0. World Wide Web Consortium, Proposed Recommendation PR - x , 2000 .

[6]  Herbert Van de Sompel,et al.  Reference Linking in a Hybrid Library Environment. Part 3: Generalizing the SFX solution in the "SFX@Ghent & SFX@LANL" experiment , 1999, D Lib Mag..

[7]  Elaine Svenonius The Intellectual Foundation of Information Organization , 2000 .

[8]  Brandon Muramatsu,et al.  The National Engineering Education Delivery System: A Digital Library for Engineering Education , 1999, D Lib Mag..

[9]  Les Carr,et al.  Citation linking: improving access to online journals , 1997, DL '97.

[10]  Norman Paskin E‐citations: actionable identifiers and scholarly referencing , 2000, Learn. Publ..

[11]  Les Carr,et al.  Developing services for open eprint archives: globalisation, integration and the impact of links , 2000, DL '00.

[12]  Bernard M.E. Moret ACM's Journal of Experimental Algorithmics: Bridging the Gap Between Theory and Practice , 1997 .

[13]  Helen Atkins,et al.  Reference Linking with DOIs: A Case Study , 2000, D Lib Mag..

[14]  Herbert Van de Sompel,et al.  Reference Linking in a Hybrid Library Environment , 1999 .

[15]  Les Carr,et al.  Linking Electronic Journals: Lessons from the Open Journal Project , 1998, D Lib Mag..

[16]  Paul Caton,et al.  Markup's Current Imbalance , 2001, Markup Lang..

[17]  William Y. Arms Automated Digital Libraries: How Effectively Can Computers Be Used for the Skilled Tasks of Professional Librarianship? , 2000, D Lib Mag..

[18]  C. Lee Giles,et al.  Digital Libraries and Autonomous Citation Indexing , 1999, Computer.

[19]  Sergey Brin,et al.  Extracting Patterns and Relations from the World Wide Web , 1998, WebDB.

[20]  Andy Powell Resolving DOI Based URNs Using Squid: An Experimental System at UKOLN , 1998, D Lib Mag..

[21]  Donna Bergmark Link Accessibility in Electronic Journal Articles , 2000 .

[22]  Kristen Maria Summers Near-wordless document structure classification , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[23]  Carl Lagoze,et al.  An Architecture for Automatic Reference Linking , 2001, ECDL.

[24]  Steven J. DeRose,et al.  Xml linking language (xlink), version 1. 0 , 2000, WWW 2000.