What is the Role of NLP in Text Retrieval

This paper addresses the value of linguistically-motivated indexing (LMI) for document and text retrieval. After reviewing the basic concepts involved and the assumptions on which LMI is based, namely that complex index descriptions and terms are necessary, I consider past and recent research on LMI, and specifically on automated LMI via NLP. Experiments in the first phase of research, to the late eighties, did not demonstrate value in LMI, but were very limited; but the much larger tests of the Nineties, with full text, have not done so either. My conclusion is that LMI is not needed for effective retrieval, but has other important roles within information-selection systems.

[1]  Cyril Cleverdon,et al.  The Cranfield tests on index language devices , 1997 .

[2]  Philip J. Hayes,et al.  Intelligent high-volume text processing using shallow, domain-specific techniques , 1992 .

[3]  David A. Evans,et al.  Clarit-TREC Experiments , 1995, Inf. Process. Manag..

[4]  W. Bruce Croft,et al.  The use of phrases and structured queries in information retrieval , 1991, SIGIR '91.

[5]  Cyril W. Cleverdon A comparative evaluation of searching by controlled language and natural language in experimental N.A.S.A. data base , 1977 .

[6]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[7]  Peter Norvig,et al.  Text-Based Intelligent Systems , 1994, Artif. Intell..

[8]  Nathalie Bely,et al.  Procédures d'analyse sémantique appliquées à la documentation scientifique , 1970 .

[9]  David A. Hull Stemming algorithms: a case study for detailed evaluation , 1996 .

[10]  Fred J. Damerau,et al.  Generating and Evaluating Domain-Oriented Multi-Word Terms from Texts , 1993, Inf. Process. Manag..

[11]  Donald J. Hillman Negotiation of inquiries in an on-line retrieval system , 1968, Inf. Storage Retr..

[12]  Donna K. Harman,et al.  How effective is suffixing? , 1991, J. Am. Soc. Inf. Sci..

[13]  Paul H. Klingbiel,et al.  Evaluation of machine-aided indexing , 1976, Inf. Process. Manag..

[14]  Paul H. Klingbiel A technique for machine-aided indexing , 1973, Inf. Storage Retr..

[15]  Hinrich Schütze,et al.  Information retrieval based on word senses , 1995 .

[16]  Tomek Strzalkowski Natural Language Information Retrieval , 1995, Inf. Process. Manag..

[17]  Tomek Strzalkowski,et al.  Robust Text Processing in Automated Information Retrieval , 1994, ANLP.

[18]  Karen Spärck Jones,et al.  Automatic Search Term variant Generation , 1984, J. Documentation.

[19]  Norbert Fuhr,et al.  The automatic indexing system AIR/PHYS - from research to applications , 1988, SIGIR '88.

[20]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[21]  Michael T. Genuardi,et al.  Machine-Aided Indexing at NASA , 1994, Inf. Process. Manag..

[22]  Gerard Salton,et al.  A new comparison between conventional indexing (MEDLARS) and automatic text processing (SMART) , 1972, J. Am. Soc. Inf. Sci..

[23]  F. W. Lancaster,et al.  Vocabulary control for information retrieval , 1972 .

[24]  Martin Dillon,et al.  FASIT: A fully automatic syntactically based indexing system , 1983, J. Am. Soc. Inf. Sci..

[25]  Joel L Fagan,et al.  Experiments in Automatic Phrase Indexing For Document Retrieval: A Comparison of Syntactic and Non-Syntactic Methods , 1987 .

[26]  Michael L. Mauldin,et al.  Retrieval performance in Ferret a conceptual information retrieval system , 1991, SIGIR '91.

[27]  Elaine Svenonius,et al.  Theory of Subject Analysis: A Sourcebook , 1985 .

[28]  Joel L. Fagan The effectiveness of a nonsyntatic approach to automatic phrase indexing for document retrieval , 1989 .

[29]  Gerard Salton,et al.  Automatic Routing and Retrieval Using Smart: TREC-2 , 1995, Inf. Process. Manag..

[30]  Lisa F. Rau,et al.  Natural language techniques for intelligent information retrieval , 1988, SIGIR '88.

[31]  Udo Hahn,et al.  Topic parsing: Accounting for text macro structures in full-text analysis , 1990, Inf. Process. Manag..

[32]  W. Bruce Croft,et al.  Lexical ambiguity and information retrieval , 1992, TOIS.

[33]  Joel L. Fagan,et al.  The effectiveness of a nonsyntactic approach to automatic phrase indexing for document retrieval , 1989, JASIS.

[34]  Alan F. Smeaton,et al.  Experiments on incorporating syntactic processing of user queries into a document retrieval strategy , 1988, SIGIR '88.

[35]  W. Bruce Croft,et al.  TREC and Tipster Experiments with Inquery , 1995, Inf. Process. Manag..

[36]  David D. Lewis,et al.  Representation and Learning in Information Retrieval , 1991 .