Head/Modifier Frames for Information Retrieval

We describe a principled method for representing documents by phrases abstracted into Head/Modifier pairs. First the notion of aboutness and the characterization of full-text documents by HM pairs is didcussed. Based on linguistic arguments, a taxonomy of HM pairs is derived. We briefly describe the EP4IR parser/transducer of English and present some statistics of the distribution of HM pairs in newspaper text.

[1]  Peter Bruza,et al.  Investigating aboutness axioms using information fields , 1994, SIGIR '94.

[2]  Marc Krier,et al.  Automatic categorisation applications at the European patent office , 2002 .

[3]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[4]  Peter Bruza,et al.  A study of aboutness in information retrieval , 1996, Artificial Intelligence Review.

[5]  Cornelis H. A. Koster Affix Grammars for Natural Languages , 1991, Attribute Grammars, Applications and Systems.

[6]  Karen Sparck Jones What is the Role of NLP in Text Retrieval , 1999 .

[7]  Avi Arampatzis,et al.  An Evaluation of Linguistically-motivated Indexing Schemes , 2000 .

[8]  Gregory Grefenstette Light parsing as finite state filtering , 1999 .

[9]  Alan F. Smeaton,et al.  Using NLP or NLP Resources for Information Retrieval Tasks , 1999 .

[10]  Ted Briscoe,et al.  Corpus Annotation for Parser Evaluation , 1999, ArXiv.

[11]  David D. Lewis,et al.  Representation and Learning in Information Retrieval , 1991 .

[12]  Terry Winograd,et al.  Language as a Cognitive Process , 1983, CL.

[13]  Henk Alblas,et al.  Attribute Grammars, Applications and Systems , 1991, Lecture Notes in Computer Science.

[14]  Peter Bruza,et al.  The modelling and retrieval of documents using index expressions , 1991, SIGF.

[15]  Tomek Strzalkowski Natural Language Information Retrieval , 1995, Inf. Process. Manag..

[16]  Dekang Lin,et al.  A dependency-based method for evaluating broad-coverage parsers , 1995, Natural Language Engineering.

[17]  Gregory Grefenstette,et al.  CLARIT TREC Design, Experiments, and Results , 1992, TREC.

[18]  Cornelis H. A. Koster,et al.  Taming Wild Phrases , 2003, ECIR.

[19]  Joel L Fagan,et al.  Experiments in Automatic Phrase Indexing For Document Retrieval: A Comparison of Syntactic and Non-Syntactic Methods , 1987 .

[20]  Mihai Nadin T. Winograd, Language as a Cognitive Process, Volume I: Syntax , 1985, Artif. Intell..

[21]  Cornelis H. A. Koster,et al.  The AGFL Grammar Work Lab , 2002, USENIX Annual Technical Conference, FREENIX Track.

[22]  Alexander F. Gelbukh,et al.  Automatic Syntactic Analysis for Detection of Word Combinations , 2004, CICLing.