Exploiting citation overlaps for Information Retrieval: Generating a boomerang effect from the network of scientific papers

A new citation search strategy is proposed for Information Retrieval (IR) based on the principle of polyrepresentation (Ingwersen, 1992, 1996). The strategy exploits logical overlaps between a range of cognitively different interpretations of the same documents in a structured manner, i.e. so-called cognitive overlaps of representations. The strategy is essentially a "cycling strategy" starting with documents retrieved by a subject search, wherefrom new documents are identified automatically by following the network of citations in scientific papers backwards and forwards in time. In contrast to earlier citation search strategies the proposed strategy does not require known relevant documents (seed documents) as a starting point, but may be based on a subject search. A pilot study is reported where the ability of the strategy to retrieve additional relevant documents is analysed. Results show that a very large amount of documents can be retrieved by the strategy, and that these may be segmented in a number of distinct "overlap levels". It is demonstrated that the combined core of the higher-level overlaps contains higher relevance density than found in the original retrieval results. Based on these results it is suggested that the documents be displayed in order of their presence in higher-level overlaps, so as to maximise the chances that as many relevant documents as possible will be presented first to a user.

[1]  Peter Ingwersen,et al.  Data set isolation for bibliometric online analyses of research publications: fundamental methodological issues , 1997 .

[2]  RonaldRousseau,et al.  The publication-citation matrix and its derived quantities , 2001 .

[3]  Miranda Lee Pao,et al.  Term and Citation Retrieval: A Field Study , 1993, Inf. Process. Manag..

[4]  Katherine W. McCain,et al.  Descriptor and citation retrieval in the medical behavioral sciences literature:retrieval overlaps and novelty distribution , 1989 .

[5]  Pia Borlund,et al.  Experimental components for the evaluation of interactive information retrieval systems , 2000, J. Documentation.

[6]  E Garfield,et al.  "Science Citation Index"--A New Dimension in Indexing. , 1964, Science.

[7]  Anthony E. Cawkell,et al.  Search strategy, construction and use of citation networks, with a socio-scientific example: "Amorphous semi-conductors and S.R. ovshinsky" , 1974, J. Am. Soc. Inf. Sci..

[8]  Cyril W. Cleverdon,et al.  Factors determining the performance of indexing systems , 1966 .

[9]  David Ellis,et al.  A Behavioural Approach to Information Retrieval System Design , 1989, J. Documentation.

[10]  Cyril W. Cleverdon,et al.  Aslib Cranfield research project - Factors determining the performance of indexing systems; Volume 1, Design; Part 2, Appendices , 1966 .

[11]  E GARFIELD,et al.  Citation indexes for science; a new dimension in documentation through association of ideas. , 2006, Science.

[12]  Carol Tenopir,et al.  Full text database retrieval performance , 1985 .

[13]  Carol Tenopir,et al.  Issues in online database searching , 1989 .

[14]  Peter Ingwersen,et al.  Information Retrieval Interaction , 1992 .

[15]  Carol C. Spencer,et al.  Subject searching with science citation index: Preparation of a drug bibliography using chemical abstracts, index medicus, and science citation index 1961 and 1964 , 1967 .

[16]  Peter Ingwersen,et al.  Cognitive Perspectives of Information Retrieval Interaction: Elements of a Cognitive IR Theory , 1996, J. Documentation.

[17]  W. Bruce Croft,et al.  I 3 R: a new approach to the design of document retrieval systems , 1987 .

[18]  F. W. Lancaster,et al.  Indexing and abstracting in theory and practice , 1991 .

[19]  F. C. Thorne The citation index: Another case of spurious validity , 1977 .

[20]  Anthony E. Cawkell Methods of information retrieval using Web of Science: pulmonary hypertension as a subject example , 2000, J. Inf. Sci..

[21]  Jeffrey Katzer,et al.  A study of the overlap among document representations , 1983, SIGIR '83.

[22]  Anthony E. Cawkell Checking research progress on 'image retrieval by shape-matching' using the Web of Science , 1998, Aslib Proc..

[23]  D. Cases,et al.  How can we investigate citation behavior?: a study of reasons for citing literature in communication , 2000 .

[24]  E. Garfield,et al.  Can Citation Indexing Be Automated ? , 1964 .

[25]  Blaise Cronin,et al.  The citation process: The role and significance of citations in scientific communication , 1984 .

[26]  Pia Borlund,et al.  Evaluation of interactive information retrieval systems , 2000 .

[27]  Manfred Kochen,et al.  Principles of information retrieval , 1974 .

[28]  Leo Egghe,et al.  Co-citation, bibliographic coupling and a characterization of lattice citation networks , 2002, Scientometrics.