Exploring Extensive Linguistic Feature Sets in Near-Synonym Lexical Choice

In the near-synonym lexical choice task, the best alternative out of a set of near-synonyms is selected to fill a lexical gap in a text. We experiment on an approach of an extensive set, over 650, linguistic features to represent the context of a word, and a range of machine learning approaches in the lexical choice task. We extend previous work by experimenting with unsupervised and semi-supervised methods, and use automatic feature selection to cope with the problems arising from the rich feature set. It is natural to think that linguistic analysis of the word context would yield almost perfect performance in the task but we show that too many features, even linguistic, introduce noise and make the task difficult for unsupervised and semi-supervised methods. We also show that purely syntactic features play the biggest role in the performance, but also certain semantic and morphological features are needed.

[1]  Graeme Hirst,et al.  Building and Using a Lexical Knowledge Base of Near-Synonym Differences , 2006, Computational Linguistics.

[2]  Ellen M. Voorhees,et al.  Query expansion using lexical-semantic relations , 1994, SIGIR '94.

[3]  P. McCullagh,et al.  Generalized Linear Models , 1992 .

[4]  Graeme Hirst,et al.  Near-Synonymy and Lexical Choice , 2002, CL.

[5]  Diana McCarthy,et al.  Lexical Substitution as a Task for WSD Evaluation , 2002, SENSEVAL.

[6]  David Yarowsky,et al.  Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[7]  H. Schütze,et al.  Dimensions of meaning , 1992, Supercomputing '92.

[8]  Rada Mihalcea,et al.  SemEval-2010 Task 2: Cross-Lingual Lexical Substitution , 2009, SemEval@ACL.

[9]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[10]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[11]  Marianna Apidianaki,et al.  Data-Driven Semantic Analysis for Multilingual WSD and Lexical Selection in Translation , 2009, EACL.

[12]  Roberto Navigli,et al.  SemEval-2007 Task 10: English Lexical Substitution Task , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[13]  Philip Edmonds,et al.  Choosing the Word Most Typical in Context Using a Lexical Co-occurrence Network , 1997, ACL.

[14]  Antti Arppe,et al.  Univariate, bivariate, and multivariate methods in corpus-based lexicography : A study of synonymy , 2008 .

[15]  Tong Wang,et al.  Near-synonym Lexical Choice in Latent Semantic Space , 2010, COLING.

[16]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[17]  Xiaojin Zhu,et al.  Introduction to Semi-Supervised Learning , 2009, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[18]  Timo Järvinen,et al.  A non-projective dependency parser , 1997, ANLP.

[19]  Mikko Kurimo,et al.  Overview of Morpho Challenge in CLEF 2007 , 2007, CLEF.

[20]  Marine Carpuat,et al.  Improving Statistical Machine Translation Using Word Sense Disambiguation , 2007, EMNLP.

[21]  R. Harald Baayen,et al.  Statistical classification and principles of human learning , 2011 .

[22]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[23]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[24]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..