Automatic Selection of Noun Phrases as Document Descriptors in an FCA-Based Information Retrieval System

Automatic attribute selection is a critical step when using Formal Concept Analysis (FCA) in a free text document retrieval framework. Optimal attributes as document descriptors should produce smaller, clearer and more browsable concept lattices with better clustering features. In this paper we focus on the automatic selection of noun phrases as document descriptors to build an FCA-based IR framework. We present three different phrase selection strategies which are evaluated using the Lattice Distillation Factor and the Minimal Browsing Area evaluation measures. Noun phrases are shown to produce lattices with good clustering properties, with the advantage (over simple terms) of being better intensional descriptors from the user's point of view.

[1]  P. W EklundDepartment Application of Formal Concept Analysis to Information Retrieval Using a Hierarchically Structured Thesaurus , 1996 .

[2]  Julio Gonzalo,et al.  Terminology Retrieval: Towards a Synergy between Thesaurus and Free Text Searching , 2002, IBERAMIA.

[3]  Peter W. Eklund,et al.  Browsing semi-structured texts on the web using formal concept analysis , 2004 .

[4]  Miguel Toro,et al.  Advances in Artificial Intelligence — IBERAMIA 2002 , 2002, Lecture Notes in Computer Science.

[5]  Robert Godin,et al.  Design of a browsing interface for information retrieval , 1989, SIGIR '89.

[6]  Julio Gonzalo,et al.  Browsing Search Results via Formal Concept Analysis: Automatic Selection of Attributes , 2004, ICFCA.

[7]  Rokia Missaoui,et al.  Experimental Comparison of Navigation in a Galois Lattice with Conventional Information Retrieval Methods , 1993, Int. J. Man Mach. Stud..

[8]  Claudio Carpineto,et al.  A lattice conceptual clustering system and its application to browsing retrieval , 2004, Machine Learning.

[9]  Uta Priss,et al.  Lattice-based information retrieval , 2000 .

[10]  Peter W. Eklund,et al.  Scalability in Formal Concept Analysis , 1999, Comput. Intell..

[11]  Julio Gonzalo,et al.  Cross-Language Information Access through Phrase Browsing , 2001, NLDB.

[12]  Peter W. Eklund,et al.  A Knowledge Representation for Information Filtering Using Formal Concept Analysis , 2000, Electron. Trans. Artif. Intell..

[13]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[14]  Julio Gonzalo,et al.  Corpus-based terminology extraction applied to information access , 2001 .

[15]  Claudio Carpineto,et al.  Concept data analysis - theory and applications , 2004 .