Noun-Phrase Analysis in Unrestricted Text for Information Retrieval

Information retrieval is an important application area of natural-language processing where one encounters the genuine challenge of processing large quantities of unrestricted natural-language text. This paper reports on the application of a few simple, yet robust and efficient noun-phrase analysis techniques to create better indexing phrases for information retrieval. In particular, we describe an hybrid approach to the extraction of meaningful (continuous or discontinuous) subcompounds from complex noun phrases using both corpus statistics and linguistic heuristics. Results of experiments show that indexing based on such extracted subcompound improves both recall and precision in an information retrieval system. The noun-phrase analysis techniques are also potentially useful for book indexing and automatic thesaurus extraction.

[1]  Gregory Grefenstette,et al.  CLARIT TREC Design, Experiments, and Results , 1992, TREC.

[2]  Donna K. Harman,et al.  Overview of the Third Text REtrieval Conference (TREC-3) , 1995, TREC.

[3]  Philip Resnik,et al.  Structural Ambiguity and Conceptual Relations , 1993, VLC@ACL.

[4]  Natasa Milic-Frayling,et al.  CLARIT TREC-4 Experiments , 1995, TREC.

[5]  Tomek Strzalkowski Natural Language Information Retrieval , 1995, Inf. Process. Manag..

[6]  M. Liberman,et al.  The Stress and Structure of Modified Noun Phrases in English , 1992 .

[7]  Mitchell P. Marcus,et al.  A theory of syntactic recognition for natural language , 1979 .

[8]  James Pustejovsky,et al.  Lexical Semantic Techniques for Corpus Analysis , 1993, CL.

[9]  Donna Harman The First Text REtrieval Conference (TREC-1) | NIST , 1993 .

[10]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[11]  Mark Lauer,et al.  Corpus Statistics Meet the Noun Compound: Some Empirical Results , 1995, ACL.

[12]  Christoph Schwarz Content based text handling , 1990, Inf. Process. Manag..

[13]  David A. Evans,et al.  Clarit-TREC Experiments , 1995, Inf. Process. Manag..

[14]  Mary Hart,et al.  Automatic indexing using selective NLP and first-order thesauri , 1991, RIAO.

[15]  Tomek Strzalkowski,et al.  Recent Developments in Natural Language Text Retrieval , 1993, TREC.

[16]  Donna K. Harman,et al.  Overview of the Fourth Text REtrieval Conference (TREC-4) , 1995, TREC.

[17]  Donna Harman,et al.  The Second Text Retrieval Conference (TREC-2) , 1995, Inf. Process. Manag..

[18]  Alan F. Smeaton,et al.  Progress in the Application of Natural Language Processing to Information Retrieval Tasks , 1992, Comput. J..