Combining Association Measures for Collocation Extraction

We introduce the possibility of combining lexical association measures and present empirical results of several methods employed in automatic collocation extraction. First, we present a comprehensive summary overview of association measures and their performance on manually annotated data evaluated by precision-recall graphs and mean average precision. Second, we describe several classification methods for combining association measures, followed by their evaluation and comparison with individual measures. Finally, we propose a feature selection algorithm significantly reducing the number of combined measures with only a small performance degradation.

[1]  Stefan Evert,et al.  Methods for the Qualitative Evaluation of Lexical Association Measures , 2001, ACL.

[2]  Robert C. Moore On Log-Likelihood-Ratios and the Significance of Rare Events , 2004, EMNLP.

[3]  Brigitte Krenn,et al.  The usual suspects: data-oriented models for identification und representation of lexical collocations , 1999 .

[4]  Sayori Shimohata,et al.  Retrieving Collocations by Co-Occurrences and Word Order Constraints , 1997, ACL.

[5]  Kenji Kita,et al.  A comparative study of automatic extraction of collocations from corpora: mutual information vs , 1994 .

[6]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[7]  Tom Fawcett,et al.  ROC Graphs: Notes and Practical Considerations for Data Mining Researchers , 2003 .

[8]  ChengXiang Zhai,et al.  Exploiting Context to Identify Lexical Atoms - A Statistical View of Linguistic Context , 1997, ArXiv.

[9]  Graeme Hirst,et al.  Acquiring Collocations for Lexical Choice between Near-Synonyms , 2002, Proceedings of the ACL-02 workshop on Unsupervised lexical acquisition -.

[10]  David A. Hull Using statistical testing in the evaluation of retrieval experiments , 1993, SIGIR.

[11]  Stefan Evert,et al.  The Statistics of Word Cooccur-rences: Word Pairs and Collocations , 2004 .

[12]  Pavel Pecina An Extensive Empirical Study of Collocation Extraction Methods , 2005, ACL.

[13]  Ted Pedersen,et al.  An Evaluation Exercise for Word Alignment , 2003, ParallelTexts@NAACL-HLT.

[14]  Christina Gloeckner,et al.  Modern Applied Statistics With S , 2003 .

[15]  Robert Tibshirani,et al.  The Entire Regularization Path for the Support Vector Machine , 2004, J. Mach. Learn. Res..

[16]  Darren Pearce A Comparative Evaluation of Collocation Extraction Techniques , 2002, LREC.

[17]  Yaacov Choueka,et al.  Looking for Needles in a Haystack or Locating Interesting Collocational Expressions in Large Textual Databases , 1988, RIAO Conference.