Comparing citation contexts for information retrieval

In previous work, we have shown that using terms from around citations in citing papers to index the cited paper, in addition to the cited paper's own terms, can improve retrieval effectiveness. Now, we investigate how to select text from around the citations in order to extract good index terms. We compare the retrieval effectiveness that results from a range of contexts around the citations, including no context, the entire citing paper, some fixed windows and several variations with linguistic motivations. We conclude with an analysis of the benefits of more complex, linguistically motivated methods for extracting citation index terms, over using a fixed window of terms. We speculate that there might be some advantage to using computational linguistic techniques for this task.

[1]  M. M. Kessler Bibliographic coupling between scientific papers , 1963 .

[2]  Cyril W. Cleverdon,et al.  Aslib Cranfield research project - Factors determining the performance of indexing systems; Volume 1, Design; Part 2, Appendices , 1966 .

[3]  Cyril W. Cleverdon,et al.  Aslib Cranfield research project - Factors determining the performance of indexing systems; Volume 1, Design; Part 1, Text , 1966 .

[4]  Henry G. Small,et al.  Co-citation in the scientific literature: A new measure of the relationship between two documents , 1973, J. Am. Soc. Inf. Sci..

[5]  John O'Connor,et al.  Citing statements: Computer recognition and use to improve retrieval , 1982, Inf. Process. Manag..

[6]  John O'Connor Biomedical citing statements: Computer recognition and use to aid full-text retrieval , 1983, Inf. Process. Manag..

[7]  Mark D. Dunlop,et al.  Hypermedia and Free Text Retrieval , 1993, Inf. Process. Manag..

[8]  Oliver A. McBryan,et al.  GENVL and WWWW: Tools for taming the Web , 1994, WWW Spring 1994.

[9]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[10]  Andrei Mikheev,et al.  Tagging Sentence Boundaries , 2000, ANLP.

[11]  Ted Briscoe,et al.  Robust Accurate Statistical Annotation of General Text , 2002, LREC.

[12]  Shannon Bradshaw,et al.  Reference Directed Indexing: Redeeming Relevance for Subject Search in Citation Indexes , 2003, ECDL.

[13]  William R. Hersh,et al.  TREC GENOMICS Track Overview , 2003, TREC.

[14]  Michael Kluck The GIRT Data in the Evaluation of CLIR Systems - from 1997 Until 2003 , 2003, CLEF.

[15]  David Hawking,et al.  The Very Large Collection and Web Tracks (Preprint version) , 2004 .

[16]  Marti A. Hearst,et al.  TREC 2004 Genomics Track Overview , 2005, TREC.

[17]  Manabu Okumura,et al.  Automatic Detection of Survey Articles , 2005, ECDL.

[18]  W. Bruce Croft,et al.  Indri: A language-model based search engine for complex queries1 , 2005 .

[19]  Jesper W. Schneider,et al.  Verification of bibliometric methods' applicability for thesaurus construction , 2005, SIGF.

[20]  Marti A. Hearst,et al.  Summarizing Key Concepts using Citation Sentences , 2006, BioNLP@NAACL-HLT.

[21]  Stephen E. Robertson,et al.  Creating a Test Collection for Citation-based IR Experiments , 2006, HLT-NAACL.

[22]  Simone Teufel,et al.  Automatic classification of citation function , 2006, EMNLP.

[23]  Simone Teufel,et al.  How to Find Better Index Terms Through Citations , 2006 .

[24]  Atsushi Fujii Enhancing patent retrieval by citation analysis , 2007, SIGIR.

[25]  Simone Teufel,et al.  Creating a test collection: relevance judgements of cited & non-cited papers , 2007 .

[26]  Maarten de Rijke,et al.  Using Prior Information Derived from Citations in Literature Search , 2007, RIAO.

[27]  W. Bruce Croft,et al.  Recommending citations for academic papers , 2007, SIGIR.

[28]  Robert Dale,et al.  Evidence-Based Information Extraction for High Accuracy Citation and Author Name Identification , 2007, RIAO.

[29]  Stephen E. Robertson,et al.  Using Terms from Citations for IR: Some First Results , 2008, ECIR.