Normalization and Matching in the DORO System

This paper is concerned with the use of linguistically motivated phrases as indexing terms in Information Retrieval applications. Apart from the conventional noun phrases, we propose to use verb phrases as index terms for text classification. Techniques for phrase matching through syntactic normalization and semantical matching are described. We discuss the realization of the syntactic normalization of phrases by transduction to frames. Semantical normalization is based on lexico-semantical relations, taking into account certain properties of the classification algorithms used. The ideas described here are being implemented in the Document Routing system DORO, in which statistical learning algorithms are applied to document profiles consisting of phrases. This paper describes the rationale behind work in progress, rather than presenting final results

[1]  German Rigau,et al.  Book Reviews: EuroWordNet: A Multilingual Database with Lexical Semantic Networks , 1999, CL.

[2]  C. Koster Aax Grammars for Natural Languages , 1991 .

[3]  Peter-Arno Coppen A new version of the AMAZON/CASUS system , 1994 .

[4]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.

[5]  Ido Dagan,et al.  Mistake-Driven Learning in Text Categorization , 1997, EMNLP.

[6]  Joel L Fagan,et al.  Experiments in Automatic Phrase Indexing For Document Retrieval: A Comparison of Syntactic and Non-Syntactic Methods , 1987 .

[7]  Julio Gonzalo,et al.  Indexing with WordNet synsets can improve text retrieval , 1998, WordNet@ACL/COLING.

[8]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[9]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[10]  Avi Arampatzis,et al.  Phase-Based Information Retrieval , 1998, Inf. Process. Manag..

[11]  Gregory Grefenstette,et al.  CLARIT TREC Design, Experiments, and Results , 1992, TREC.

[12]  Tomek Strzalkowski Natural Language Information Retrieval , 1995, Inf. Process. Manag..

[13]  Yoram Singer,et al.  Context-sensitive learning methods for text categorization , 1996, SIGIR '96.

[14]  Cornelis H. A. Koster,et al.  Four text classification algorithms compared on a Dutch corpus , 1998, SIGIR '98.

[15]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[16]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[17]  Tomek Strzalkowski,et al.  Natural Language Information Retrieval: TREC-8 Report , 1994, TREC.

[18]  Avi Arampatzis,et al.  Phrase-based Information Retrieval , 1998 .

[19]  Cornelis H. A. Koster Affix Grammars for Natural Languages , 1991, Attribute Grammars, Applications and Systems.

[20]  Alan F. Smeaton,et al.  Experiments on using semantic distances between words in image caption retrieval , 1996, SIGIR '96.

[21]  Alan F. Smeaton,et al.  Using NLP or NLP Resources for Information Retrieval Tasks , 1999 .