Recognition and classification of noun phrases in queries for effective retrieval

It has been shown that using phrases properly in the document retrieval leads to higher retrieval effectiveness. In this paper, we define four types of noun phrases and present an algorithm for recognizing these phrases in queries. The strengths of several existing tools are combined for phrase recognition. Our algorithm is tested using a set of 500 web queries from a query log, and a set of 238 TREC queries. Experimental results show that our algorithm yields high phrase recognition accuracy. We also use a baseline noun phrase recognition algorithm to recognize phrases from the TREC queries. A document retrieval experiment is conducted using the TREC queries (1) without any phrases, (2) with the phrases recognized from a baseline noun phrase recognition algorithm, and (3) with the phrases recognized from our algorithm respectively. The retrieval effectiveness of (3) is better than that of (2), which is better than that of (1). This demonstrates that utilizing phrases in queries does improve the retrieval effectiveness, and better noun phrase recognition yields higher retrieval performance.

[1]  Avi Arampatzis,et al.  Phrase-based Information Retrieval , 1998 .

[2]  Teruko Mitamura,et al.  Knowledge-based extraction of named entities , 2002, CIKM '02.

[3]  Ophir Frieder,et al.  Information Retrieval: Algorithms and Heuristics (The Kluwer International Series on Information Retrieval) , 2004 .

[4]  Dekang Lin,et al.  PRINCIPAR - An Efficient, Broad-coverage, Principle-based Parser , 1994, COLING.

[5]  Clement T. Yu,et al.  An effective approach to document retrieval via utilizing WordNet and recognizing phrases , 2004, SIGIR '04.

[6]  W. Bruce Croft,et al.  The use of phrases and structured queries in information retrieval , 1991, SIGIR '91.

[7]  Jian-Yun Nie,et al.  Integrating word relationships into language models , 2005, SIGIR '05.

[8]  Rada Mihalcea,et al.  An Automatic Method for Generating Sense Tagged Corpora , 1999, AAAI/IAAI.

[9]  C. Fellbaum An Electronic Lexical Database , 1998 .

[10]  Dekang Lin Using Collocation Statistics in Information Extraction , 1998, MUC.

[11]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[12]  Avi Arampatzis,et al.  Phase-Based Information Retrieval , 1998, Inf. Process. Manag..

[13]  Ellen M. Voorhees,et al.  Overview of TREC 2005 , 2005, TREC.

[14]  ChengXiang Zhai,et al.  Noun-Phrase Analysis in Unrestricted Text for Information Retrieval , 1996, ACL.

[15]  Clement T. Yu,et al.  Knowledge-intensive conceptual retrieval and passage extraction of biomedical literature , 2007, SIGIR.

[16]  Ann Bies,et al.  The Penn Treebank: Annotating Predicate Argument Structure , 1994, HLT.

[17]  Charles L. A. Clarke,et al.  Frequency Estimates for Statistical Word Similarity Measures , 2003, NAACL.

[18]  Eric Brill,et al.  Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging , 1995, CL.

[19]  Tong Zhang,et al.  Named Entity Recognition through Classifier Combination , 2003, CoNLL.

[20]  Inderjeet Mani,et al.  Identifying Unknown Proper Names in Newswire Text , 1996 .

[21]  Richard M. Schwartz,et al.  Nymble: a High-Performance Learning Name-finder , 1997, ANLP.

[22]  Mark S. Staveley,et al.  Phrasier: a system for interactive document retrieval using keyphrases , 1999, SIGIR '99.

[23]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[24]  Stephen E. Robertson,et al.  Okapi/Keenbow at TREC-8 , 1999, TREC.

[25]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[26]  Nancy A. Chinchor,et al.  Overview of MUC-7 , 1998, MUC.

[27]  Jan O. Pedersen,et al.  Phrase recognition and expansion for short, precision-biased queries based on a query log , 1999, SIGIR '99.

[28]  Kam-Fai Wong,et al.  Hybrid Term Indexing for Weighted Boolean and Vector Space Models , 2001, Int. J. Comput. Process. Orient. Lang..