Natural language processing for information retrieval

The paper summarizes the essential properties of document retrieval and reviews both conventional practice and research findings, the latter suggesting that simple statistical techniques can be effective. It then considers the new opportunities and challenges presented by the user’s ability to search full text directly (rather than e.g. titles and abstracts), and suggests appropriate approaches to doing this, with a focus on the potential role of natural language processing. The paper also comments on possible connections with data and knowledge retrieval, and concludes by emphasizing the importance of rigorous performance testing.

[1]  Norbert Fuhr,et al.  Models for retrieval with probabilistic indexing , 1989, Inf. Process. Manag..

[2]  Gerard Salton,et al.  Improving Retrieval Performance by Relevance Feedback , 1997 .

[3]  Donna Harman,et al.  The First Text REtrieval Conference (TREC-1) , 1993 .

[4]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[5]  Frank A. Srnad ja,et al.  From N-Grams to Collocations: An Evaluation of Xtract , 1991, ACL.

[6]  Gerard Salton,et al.  Improving retrieval performance by relevance feedback , 1997, J. Am. Soc. Inf. Sci..

[7]  W. Bruce Croft,et al.  Experiments with query acquisition and use in document retrieval systems , 1989, SIGIR '90.

[8]  Peter Willett,et al.  Document Retrieval Systems , 1988 .

[9]  Karen Spärck Jones Fashionable trends and feasible strategies in information management , 1988, Inf. Process. Manag..

[10]  Tomek Strzalkowski,et al.  Recent Developments in Natural Language Text Retrieval , 1993, TREC.

[11]  Kenneth Ward Church A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text , 1988, ANLP.

[12]  Karen Spärck Jones,et al.  Natural language interfaces to databases , 1990, The Knowledge Engineering Review.

[13]  Lisa F. Rau,et al.  SCISOR: extracting information from on-line news , 1990, CACM.

[14]  Gerard Salton,et al.  Another look at automatic text-retrieval systems , 1986, CACM.

[15]  Kenneth Ward Church A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text , 1988, ANLP.

[16]  Tomek Strzalkowski,et al.  TTP: A Fast and Robust Parser for Natural Language , 1992, COLING.

[17]  W. Bruce Croft,et al.  Lexical ambiguity and information retrieval , 1992, TOIS.

[18]  W. Bruce Croft,et al.  The use of phrases and structured queries in information retrieval , 1991, SIGIR '91.

[19]  William A. Woods,et al.  Progress in natural language understanding: an application to lunar geology , 1973, AFIPS National Computer Conference.

[20]  Karen Spärck Jones,et al.  Automatic Search Term variant Generation , 1984, J. Documentation.

[21]  Mary Hart,et al.  Automatic indexing using selective NLP and first-order thesauri , 1991, RIAO.

[22]  Mark H. Chignell,et al.  Intelligent databases , 1990 .

[23]  Joel L Fagan,et al.  Experiments in Automatic Phrase Indexing For Document Retrieval: A Comparison of Syntactic and Non-Syntactic Methods , 1987 .

[24]  Lynette Hirschman,et al.  Evaluating Message Understanding Systems: An Analysis of the Third Message Understanding Conference (MUC-3) , 1993, CL.

[25]  Udo Hahn,et al.  Topic parsing: Accounting for text macro structures in full-text analysis , 1990, Inf. Process. Manag..

[26]  Chris Buckley,et al.  The Importance of Proper Weighting Methods , 1993, HLT.

[27]  Ralph Grishman,et al.  Grammatically-based automatic word class formation , 1975, Inf. Process. Manag..

[28]  Jessica L. Milstead,et al.  Subject Access Systems: Alternatives in Design , 1984 .

[29]  Teresa Pritchard-Schoch Natural language comes of age , 1993 .

[30]  W. Bruce Croft,et al.  Evaluation of an inference network-based retrieval model , 1991, TOIS.

[31]  B. Clifford Neuman,et al.  A Comparison of Internet Resource Discovery Approaches , 1992, Comput. Syst..

[32]  W. Bruce Croft,et al.  An evaluation of query processing strategies using the TIPSTER collection , 1993, SIGIR.

[33]  Sheryl R. Young,et al.  Automatic Classification and Summarization of Banking Telexes , 1985, CAIA.

[34]  Donna K. Harman,et al.  Overview of the Second Text REtrieval Conference (TREC-2) , 1994, HLT.

[35]  Donald Hindle,et al.  Noun Classification From Predicate-Argument Structures , 1990, ACL.

[36]  G Salton,et al.  Global Text Matching for Information Retrieval , 1991, Science.

[37]  Susan T. Dumais,et al.  The vocabulary problem in human-system communication , 1987, CACM.

[38]  David D. Lewis,et al.  An evaluation of phrasal and clustered representations on a text categorization task , 1992, SIGIR '92.

[39]  Karen Sparck Jones Search term relevance weighting- some recent results , 1979 .

[40]  Donna Harman,et al.  The Second Text Retrieval Conference (TREC-2) , 1995, Inf. Process. Manag..

[41]  David A. Evans,et al.  Design and Evaluation of the CLARIT-TREC-2 System , 1993, TREC.