Exploiting Coauthorship to Infer Topicality in a Digital Library of Computer Science Technical Reports

We propose a method of mapping the topical content of distributed digital libraries and demonstrate the technique using data from the Networked Computer Science Technical Report Library (NCSTRL) digital library project. This method seeks to exploit information derived from document coauthorship to produce improved automatic subject classifications of the documents. In a distributed digital library, these subject classifications are useful in characterizing both intra-site and inter-site content. They are also helpful in providing secondary retrieval services. We present the method and describe an experiment and results showing that improved clusterings can be achieved relative to traditional document clustering.

[1]  GreenRebecca Topical relevance relationships. I , 1995 .

[2]  M. M. Kessler Bibliographic coupling between scientific papers , 1963 .

[3]  Gerald Salton,et al.  Automatic text processing , 1988 .

[4]  Rebecca Green,et al.  Topical Relevance Relationships. I. Why Topic Matching Fails , 1995, J. Am. Soc. Inf. Sci..

[5]  W. Bruce Croft Clustering large files of documents using the single-link method , 1977, J. Am. Soc. Inf. Sci..

[6]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[7]  James R. Davis Creating a Networked Computer Science Technical Report Library, , 1995, D Lib Mag..

[8]  James C. French,et al.  Dissemination of collection wide information in a distributed information retrieval system , 1995, SIGIR '95.

[9]  Kui-Lam Kwok The use of title and cited titles as document representation for automatic classification , 1975, Inf. Process. Manag..

[10]  William Y. Arms,et al.  Cluster Analysis used on Social Science Journal citations , 1978, J. Documentation.

[11]  J. C. French DIRE: an approach to improving informal scientific communication , 1994 .

[12]  Gerard Salton,et al.  AUTOMATIC INDEXING USING BIBLIOGRAPHIC CITATIONS , 1971 .

[13]  Francis Narin,et al.  Clustering of scientific journals , 1973, J. Am. Soc. Inf. Sci..

[14]  Karen Spärck Jones Notes and references on early automatic classification work , 1991, SIGF.

[15]  Julie Bichteler,et al.  The combined use of bibliographic coupling and cocitation for document retrieval , 1980, J. Am. Soc. Inf. Sci..

[16]  Henry G. Small,et al.  Co-citation in the scientific literature: A new measure of the relationship between two documents , 1973, J. Am. Soc. Inf. Sci..

[17]  Gerard Salton,et al.  Associative Document Retrieval Techniques Using Bibliographic Information , 1963, JACM.

[18]  Michael E. D. Koenig,et al.  Journal clustering using a bibliographic coupling method , 1977, Inf. Process. Manag..

[19]  Ellen M. Voorhees,et al.  Implementing agglomerative hierarchic clustering algorithms for use in document retrieval , 1986, Inf. Process. Manag..

[20]  James C. French,et al.  Ensuring Retrieval Effectiveness in Distributed Digital Libraries , 1996, J. Vis. Commun. Image Represent..

[21]  Eugene Garfield,et al.  Citation indexing - its theory and application in science, technology, and humanities , 1979 .