Query Expansion and Classification of Retrieved Documents

This paper presents different methods tested by the University of Avignon and Bertin at the TREC-7 evaluation. A first section describes several methodologies used for query expansion: synonymy and stemming. Relevance feedback is applied both to the TIPSTER corpora and Internet documents. In a second section, we describe a classification algorithm based on hierarchical and clustering methods. This algorithm improves results given by any Information Retrieval system (that retrieves a list of documents from a query) and helps the users by automatically providing a structured document map from the set of retrieved documents. Lastly, we present the first results obtained with TREC-6 and TREC7 corpora and queries by using this algorithm. keywords: ad-hoc information retrieval, automatic relevance feedback, synonymy, automatic classification, cluster-based and hierarchical methods.

[1]  Tom Zimmer For text files , 1989, SIFN.

[2]  Pierre-Francois Marteau,et al.  WSD based on three short context methods , 1998 .

[3]  Jorma Rissanen,et al.  Unsupervised Classification with Stochastic Complexity , 1994 .

[4]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[5]  Marc El-Bèze,et al.  Introduction of rules into a stochastic approach for language modelling , 1999 .

[6]  Claude de Loupy,et al.  Word sense disambiguation using HMM tagger , 1998 .

[7]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[8]  James Allan,et al.  Automatic structuring and retrieval of large text files , 1994, CACM.

[9]  Yoelle Maarek Software library construction from an IR perspective , 1991, SIGF.

[10]  Marti A. Hearst,et al.  Reexamining the cluster hypothesis: scatter/gather on retrieval results , 1996, SIGIR '96.

[11]  Jan O. Pedersen,et al.  Almost-constant-time clustering of arbitrary corpus subsets4 , 1997, SIGIR '97.

[12]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[13]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[14]  Marc El-Bèze,et al.  A Clustering Method for Information Retrieval , 1999 .

[15]  Vincent Kanade,et al.  Clustering Algorithms , 2021, Wireless RF Energy Transfer in the Massive IoT Era.

[16]  Ricardo Baeza-Yates,et al.  Information Retrieval: Data Structures and Algorithms , 1992 .

[17]  Gerald Salton,et al.  Automatic text processing , 1988 .

[18]  Venkata Subramaniam,et al.  Information Retrieval: Data Structures & Algorithms , 1992 .

[19]  Adam Kilgarriff,et al.  SENSEVAL: an exercise in evaluating world sense disambiguation programs , 1998, LREC.