Coefficients of combining concept classes in a collection

This report considers combining information to improve retrieval. The vector space model has been extended so different classes of data are associated with distinct concept types and their respective subvectors. Two collections with multiple concept types are described, ISI-1460 and CACM-3204. Experiments indicate that regression methods can help predict relevance, given query-document similarity values for each concept type. After sampling and transformation of data, the coefficient of determination for the best model was .48 (.66) for ISI (CACM). Average precision for the two collections was 11% (31%) better for probabilistic feedback with all types versus with terms only. These findings may be of particular interest to designers of document retrieval or hypertext systems since the role of links is shown to be especially beneficial.

[1]  Nicholas J. Belkin,et al.  Retrieval techniques , 1987 .

[2]  Jeff Conklin,et al.  Hypertext: An Introduction and Survey , 1987, Computer.

[3]  W. Bruce Croft,et al.  The Use of Adaptive Mechanisms for Selection of Search Strategies in Document Retrieval Systems , 1984, SIGIR.

[4]  Jeffrey Katzer,et al.  A study of the overlap among document representations , 1983, SIGIR '83.

[5]  Julie Bichteler,et al.  The combined use of bibliographic coupling and cocitation for document retrieval , 1980, J. Am. Soc. Inf. Sci..

[6]  M. M. Kessler Bibliographic coupling between scientific papers , 1963 .

[7]  Thomas R. Kochtanek Bibliographic compilation using reference and citation links , 1982, Inf. Process. Manag..

[8]  Kui-Lam Kwok The use of title and cited titles as document representation for automatic classification , 1975, Inf. Process. Manag..

[9]  Edward A. Fox,et al.  Composite document extended retrieval: an overview , 1985, SIGIR '85.

[10]  Edward A. Fox,et al.  Development of the coder system: A testbed for artificial intelligence methods in information retrieval , 1987, Inf. Process. Manag..

[11]  Bella Hass Weinberg,et al.  Bibliographic coupling: A review , 1974, Inf. Storage Retr..

[12]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[13]  Henry G. Small,et al.  Co‐citation Context Analysis and the Structure of Paradigms , 1980, J. Documentation.

[14]  Chris Buckley,et al.  Implementation of the SMART Information Retrieval System , 1985 .

[15]  Edward A. Fox,et al.  Some Considerations for Implementing the SMART Information Retrieval System Under UNIX , 1983 .

[16]  Edward Fox,et al.  Extending the boolean and vector space models of information retrieval with p-norm queries and multiple concept types , 1983 .

[17]  Edward A. Fox,et al.  Characterization of Two New Experimental Collections in Computer and Information Science Containing Textual and Bibliographic Concepts , 1983 .

[18]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[19]  D. A. Fox,et al.  Some mathematical properties of cycling strategies using citation indexes , 1973, Inf. Storage Retr..

[20]  Gerard Salton,et al.  AUTOMATIC INDEXING USING BIBLIOGRAPHIC CITATIONS , 1971 .

[21]  W. Bruce Croft,et al.  I 3 R: a new approach to the design of document retrieval systems , 1987 .

[22]  Clement T. Yu,et al.  A theory of term importance in automatic text analysis , 1974, J. Am. Soc. Inf. Sci..

[23]  Stavros Christodoulakis,et al.  Multimedia document presentation, information extraction, and document formation in MINOS: a model and a system , 1986, TOIS.

[24]  SaltonGerard Associative Document Retrieval Techniques Using Bibliographic Information , 1963 .

[25]  Vijay V. Raghavan,et al.  A critical analysis of vector space model for information retrieval , 1986 .

[26]  Edward A. Fox,et al.  Practical enhanced Boolean retrieval: Experiences with the smart and sire systems , 1988, Inf. Process. Manag..

[27]  Bipin C. Desai,et al.  A Data Model for Use with Formatted and Textual Data. , 1986 .

[28]  Edward A. Fox,et al.  Testing the applicability of intelligent methods for information retrieval , 1987 .

[29]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[30]  Henry G. Small,et al.  Co-citation in the scientific literature: A new measure of the relationship between two documents , 1973, J. Am. Soc. Inf. Sci..

[31]  Gerard Salton,et al.  Associative Document Retrieval Techniques Using Bibliographic Information , 1963, JACM.