Reference Linking the Web''s Scholarly Papers

Along with the explosive growth of the Web has come a great increase in on-line scholarly literature. Thus the Web is becoming an efficient source of up-to-date information for the scientific researcher, and more and more researchers are turning to their computers to keep current on results in their field. Not only is Web retrieval usually faster than a walk to the library, but the information obtained from the Web is potentially more current than what appears in printed publications. The increasing proportion of on-line scholarly literature makes it possible to implement functionality desirable to all researchers -the ability to access cited documents immediately from the citing paper. Implementing this direct access is called "`reference linking". While many authors insert explicit links into their papers to support reference linking, it is by no means a universal practice. The approach taken by the Digital Library Research Group at Cornell employs "value-added surrogates" to enhance the reference-linking behavior of Web documents. Given the URL of an on-line paper, a surrogate object is constructed for that paper. The surrogate fetches the content of the document and parsesit to automatically extract reference linking data. Applications can then use the surrogate to access this reference linking data, encoded in XML, via a well-defined Java API. We use this API to reference link the D-Lib magazine, an on-line journal of technical papers relating to digital library research. Currently we are (automatically) extractingreference linking information from the papers in this journal with 80% accuracy.