Computing Linear Discriminants for Idiomatic Sentence Detection

In this paper, we describe the binary classication of sen- tences into idiomatic and non-idiomatic. Our idiom detection algorithm is based on linear discriminant analysis (LDA). To obtain a discriminant subspace, we train our model on a small number of randomly selected idiomatic and non-idiomatic sentences. We then project both the train- ing and the test data on the chosen subspace and use the three nearest neighbor (3NN) classier to obtain accuracy. The proposed approach is more general than the previous algorithms for idiom detection | neither does it rely on target idiom types, lexicons, or large manually annotated corpora, nor does it limit the search space by a particular linguistic con- struction.

[1]  Arkadi Nemirovski,et al.  Non-euclidean restricted memory level method for large-scale convex optimization , 2005, Math. Program..

[2]  RAYMOND W. GIBBS,et al.  Literal Meaning and Psychological Theory , 1984, Cogn. Sci..

[3]  Colin Bannard A Measure of Syntactic Flexibility for Automatically Identifying Multiword Expressions in Corpora , 2007 .

[4]  Brian Everitt,et al.  Principles of Multivariate Analysis , 2001 .

[5]  Barbara M. Horvath,et al.  Variation in Australian English , 1985 .

[6]  Suzanne Stevenson,et al.  The VNC-Tokens Dataset , 2008 .

[7]  Afsaneh Fazly,et al.  Automatically Constructing a Lexicon of Verb Phrase Idiomatic Combinations , 2006, EACL.

[8]  Afsaneh Fazly,et al.  Unsupervised Type and Token Identification of Idiomatic Expressions , 2009, CL.

[9]  Dekang Lin,et al.  Automatic Identification of Non-compositional Phrases , 1999, ACL.

[10]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[11]  Timothy Baldwin,et al.  An Empirical Model of Multiword Expression Decomposability , 2003, ACL 2003.

[12]  Aline Villavicencio,et al.  Lexical Encoding of MWEs , 2004 .

[13]  Stephen P. Boyd,et al.  Semidefinite Programming , 1996, SIAM Rev..

[14]  I. R. McCaig,et al.  Oxford Dictionary of Current Idiomatic English , 1994 .

[15]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[16]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[17]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[18]  Anoop Sarkar,et al.  A Clustering Approach for Nearly Unsupervised Recognition of Nonliteral Language , 2006, EACL.

[19]  A. Woods,et al.  Statistics in Language Studies , 1986 .

[20]  Jean Carletta,et al.  Assessing Agreement on Classification Tasks: The Kappa Statistic , 1996, CL.

[21]  Eugenie Giesbrecht,et al.  Automatic Identification of Non-Compositional Multi-Word Expressions using Latent Semantic Analysis , 2006 .

[22]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..