Handling of Missing Values in Lexical Acquisition

In this work we propose a strategy to reduce the impact of the sparse data problem in the tasks of lexical information acquisition based on the observation of linguistic cues. We propose a way to handle the uncertainty created by missing values, that is, when a zero value could mean either that the cue has not been observed because the word in question does not belong to the class, i.e. negative evidence, or that the word in question has just not been observed in the context sought by chance, i.e. lack of evidence. This uncertainty creates problems to the learner, because zero values for incompatible labelled examples make the cue lose its predictive capacity and even though some samples display the sought context, it is not taken into account. In this paper we present the results of our experiments to try to reduce this uncertainty by, as other authors do (Joanis et al. 2007, for instance), substituting zero values for pre-processed estimates. Here we present a first round of experiments that have been the basis for the estimates of linguistic information motivated by lexical classes. We obtained experimental results that show a clear benefit of the proposed approach.

[1]  Montserrat Marimon,et al.  Automatic Acquisition of Grammatical Types for Nouns , 2007, HLT-NAACL.

[2]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[3]  G. Āllport The Psycho-Biology of Language. , 1936 .

[4]  G. Zipf,et al.  The Psycho-Biology of Language , 1936 .

[5]  Ted Briscoe,et al.  Automatic Acquisition of Adjectival Subcategorization from Corpora , 2005, ACL.

[6]  MerloPaola,et al.  Automatic verb classification based on statistical distributions of argument structure , 2001 .

[7]  Timothy Baldwin,et al.  Learning the Countability of English Nouns from Corpus Data , 2003, ACL.

[8]  Brendan S. Gillon,et al.  Towards a common semantics for english count and mass nouns , 1992 .

[9]  金田 重郎,et al.  C4.5: Programs for Machine Learning (書評) , 1995 .

[10]  Marc Light,et al.  Morphological Cues for Lexical Semantics , 1996, ACL.

[11]  A. Ross Structural Linguistics , 1953, Nature.

[12]  Timothy Baldwin,et al.  Road-testing the English Resource Grammar Over the British National Corpus , 2004, LREC.

[13]  Suzanne Stevenson,et al.  A General Feature Space for Automatic Verb Classification , 2003, EACL.

[14]  J. Altarriba,et al.  Concreteness, context availability, and imageability ratings and word associations for abstract, concrete, and emotion words , 1999, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[15]  Ignacio Bosque,et al.  El sustantivo sin determinación : la ausencia de determinante en la lengua española , 1998 .

[16]  Nigel Collier,et al.  The Choice of Features for Classification of Verbs in Biomedical Texts , 2008, COLING.

[17]  Timothy Baldwin,et al.  Bootstrapping Deep Lexical Resources: Resources for Courses , 2005, ACL 2005.

[18]  Richard Sproat,et al.  Estimating Lexical Priors for Low-Frequency Morphologically Ambiguous Forms , 1996, Comput. Linguistics.

[19]  Sabine Schulte im Walde Clustering Verbs Semantically According to their Alternation Behaviour , 2000, COLING.

[20]  M. Teresa Cabré,et al.  10 anys del Corpus de l'IULA , 2006 .

[21]  Michael R. Brent,et al.  From Grammar to Lexicon: Unsupervised Learning of Lexical Syntax , 1993, Comput. Linguistics.

[22]  Yuval Krymolowski,et al.  On the Robustness of Entropy-Based Similarity Measures in Evaluation of Subcategorization Acquisition Systems , 2002, CoNLL.

[23]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[24]  Michael Collins,et al.  Three Generative, Lexicalised Models for Statistical Parsing , 1997, ACL.

[25]  Suzanne Stevenson,et al.  Automatic Verb Classification Based on Statistical Distributions of Argument Structure , 2001, CL.

[26]  Doug,et al.  Use of Syntactic and Semantic Filters for Lexical Acquisition : Using WordNet to Increase , 1996 .

[27]  John R. Anderson,et al.  The Adaptive Nature of Human Categorization , 1991 .

[28]  Alessandro Lenci,et al.  ESSLLI Workshop on Distributional Lexical Semantics Bridging the gap between semantic theory and computational simulations , 2008 .

[29]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[30]  John Mingers,et al.  An Empirical Comparison of Pruning Methods for Decision Tree Induction , 1989, Machine Learning.

[31]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .