Identification and Treatment of Multiword Expressions Applied to Information Retrieval

The extensive use of Multiword Expressions (MWE) in natural language texts prompts more detailed studies that aim for a more adequate treatment of these expressions. A MWE typically expresses concepts and ideas that usually cannot be expressed by a single word. Intuitively, with the appropriate treatment of MWEs, the results of an Information Retrieval (IR) system could be improved. The aim of this paper is to apply techniques for the automatic extraction of MWEs from corpora to index them as a single unit. Experimental results show improvements on the retrieval of relevant documents when identifying MWEs and treating them as a single indexing unit.

[1]  Timothy Baldwin,et al.  Multiword Expressions: A Pain in the Neck for NLP , 2002, CICLing.

[2]  Stefan Evert,et al.  Using small random samples for the manual evaluation of statistical association measures , 2005, Comput. Speech Lang..

[3]  Ellen M. Voorhees,et al.  Evaluating evaluation measure stability , 2000, SIGIR '00.

[4]  Timothy Baldwin,et al.  Extracting the Unextractable: A Case Study on Verb-particles , 2002, CoNLL.

[5]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[6]  Aline Villavicencio,et al.  UFRGS@CLEF2008: Indexing Multiword Expressions for Information Retrieval , 2008, CLEF.

[7]  Karen Sparck Jones What is the Role of NLP in Text Retrieval , 1999 .

[8]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[9]  Timothy Baldwin,et al.  Interpretation of Compound Nominalisations using Corpus and Web Statistics , 2006 .

[10]  Timothy Baldwin,et al.  Multiword Expressions , 2010, Handbook of Natural Language Processing.

[11]  Hugh E. Williams,et al.  The Zettair Search Engine , 1998 .

[12]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[13]  John Tait,et al.  Karen Spärck Jones , 2008 .

[14]  Ray Jackendoff,et al.  The Architecture of the Language Faculty , 1996 .

[15]  Aline Villavicencio,et al.  Automated Multiword Expression Prediction for Grammar Engineering , 2006 .

[16]  Ralph Grishman,et al.  Towards Best Practice for Multiword Expressions in Computational Lexicons , 2002, LREC.

[17]  Timothy Baldwin,et al.  An Empirical Model of Multiword Expression Decomposability , 2003, ACL 2003.

[18]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[19]  Paul Procter,et al.  Cambridge international dictionary of English , 2000 .

[20]  Carlos Ramisch,et al.  Multiword Expressions in the wild? The mwetoolkit comes in handy , 2010, COLING.

[21]  Frank Smadja,et al.  Retrieving Collocations from Text: Xtract , 1993, CL.

[22]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[23]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[24]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[25]  Darren Pearce A Comparative Evaluation of Collocation Extraction Techniques , 2002, LREC.