Information Retrieval: Still Butting Heads with Natural Language Processing?

Information retrieval (IR) is about finding documents which may be of relevance to a user's query, from within a corpus or collection of texts. While apparently a simple task at first glance, IR is in fact a hard problem because of the subtleties introduced by the use of natural language in both documents and in queries. The automatic processing of natural language clearly represents significant potential for improving information retrieval tasks because of the dominance of the natural language medium on the whole IR task. Information extraction is also fundamentally about dealing with natural language albeit for a different function. It is thus of interest to the IE community to see how a related task, perhaps the most-related task, IR, has managed to use the same NLP base technology in its development so far. This is an especially valid comparison to make since IR has been the subject of research and development and has been delivering working solutions for many decades whereas IE is a more recent and emerging technology.

[1]  David A. Hull Stemming Algorithms: A Case Study for Detailed Evaluation , 1996, J. Am. Soc. Inf. Sci..

[2]  Jonathan Furner,et al.  Information retrieval research : proceedings of the 19th annual BCS-IRSG Colloquium on IR Research, Aberdeen, Scotland, 8-9 April 1997 , 1997 .

[3]  Vijay V. Raghavan,et al.  Content-Based Image Retrieval Systems - Guest Editors' Introduction , 1995, Computer.

[4]  E. A. Fox,et al.  Combining the Evidence of Multiple Query Representations for Information Retrieval , 1995, Inf. Process. Manag..

[5]  Chris Buckley,et al.  Pivoted Document Length Normalization , 1996, SIGIR Forum.

[6]  Donna Harman,et al.  The fourth text REtrieval conference , 1996 .

[7]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[8]  Alan F. Smeaton,et al.  Using NLP or NLP Resources for Information Retrieval Tasks , 1999 .

[9]  Susan T. Dumais,et al.  Statistical semantics: analysis of the potential performance of keyword information systems , 1984 .

[10]  Alan F. Smeaton,et al.  TREC-4 Experiments at Dublin City University: Thresholding Posting Lists, Query Expansion with WordNet and POS Tagging of Spanish , 1995, TREC.

[11]  Douglas B. Lenat,et al.  CYC: a large-scale investment in knowledge infrastructure , 1995, CACM.

[12]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[13]  Alan F. Smeaton,et al.  The TREC experiments and their impact on Europe , 1997, J. Inf. Sci..

[14]  Kenneth Ward Church,et al.  Commercial applications of natural language processing , 1995, CACM.

[15]  S. T. Dumais,et al.  Human factors and behavioral science: Statistical semantics: Analysis of the potential performance of key-word information systems , 1983, The Bell System Technical Journal.

[16]  W. Bruce Croft,et al.  Interpreting nominal compounds for information retrieval , 1990, Inf. Process. Manag..

[17]  Alan F. Smeaton,et al.  Progress in the Application of Natural Language Processing to Information Retrieval Tasks , 1992, Comput. J..

[18]  Susan T. Dumais,et al.  Personalized information delivery: an analysis of information filtering methods , 1992, CACM.

[19]  Donna Harman,et al.  How effective is suffixing , 1991 .

[20]  James Allan,et al.  Approaches to passage retrieval in full text information systems , 1993, SIGIR.

[21]  K. Sparck Jones,et al.  Simple, proven approaches to text retrieval , 1994 .

[22]  Nicholas J. Belkin,et al.  Information filtering and information retrieval: two sides of the same coin? , 1992, CACM.

[23]  Mark Sanderson,et al.  Word sense disambiguation and information retrieval , 1994, SIGIR '94.

[24]  Tomek Strzalkowski,et al.  Natural Language Information Retrieval: TREC-8 Report , 1994, TREC.

[25]  Hamish Cunningham Information Extraction - A User Guide , 1997, ArXiv.

[26]  David A. Evans,et al.  Clarit-TREC Experiments , 1995, Inf. Process. Manag..