Representations of Idioms for Natural Language Processing: Idiom type and token identification, Language Modelling and Neural Machine Translation

An idiom is a multiword expression (MWE) whose meaning is noncompositional, i.e., the meaning of the expression is different from the meaning of its individual components. Idioms are complex constructions of language used creatively across almost all text genres. Idioms pose problems to natural language processing (NLP) systems due to their non-compositional nature, and the correct processing of idioms can improve a wide range of NLP systems. Current approaches to idiom processing vary in terms of the amount of discourse history required to extract the features necessary to build representations for the expressions. These features are, in general, statistics extracted from the text and often fail to capture all the nuances involved in idiom usage. We argue in this thesis that a more flexible representations must be used to process idioms in a range of idiom related tasks. We demonstrate that high-dimensional representations allow idiom classifiers to better model the interactions between global and local features and thereby improve the performance of these systems with regard to processing idioms. In support of this thesis we demonstrate that distributed representa-

[1]  I. Sag,et al.  Idioms , 2015 .

[2]  Anna Korhonen,et al.  Metaphor Identification Using Verb and Noun Clustering , 2010, COLING.

[3]  Timothy Baldwin,et al.  Multiword expressions: linguistic precision and reusability , 2002, LREC.

[4]  Richard Socher,et al.  Pointer Sentinel Mixture Models , 2016, ICLR.

[5]  Dekang Lin,et al.  Automatic Identification of Non-compositional Phrases , 1999, ACL.

[6]  Jörg Tiedemann,et al.  Parallel Data, Tools and Interfaces in OPUS , 2012, LREC.

[7]  Kenneth Heafield,et al.  KenLM: Faster and Smaller Language Model Queries , 2011, WMT@EMNLP.

[8]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[9]  Philip Gage,et al.  A new algorithm for data compression , 1994 .

[10]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[11]  Timothy Baldwin,et al.  Multiword Expressions , 2010, Handbook of Natural Language Processing.

[12]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[13]  Marion Weller,et al.  How to Account for Idiomatic German Support Verb Constructions in Statistical Machine Translation , 2015, MWE@NAACL-HLT.

[14]  Paul M. B. Vitányi,et al.  The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.

[15]  Martin Volk,et al.  Using Linguistic Annotations in Statistical Machine Translation of Film Subtitles , 2009, NODALIDA.

[16]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[17]  Yonghui Wu,et al.  Exploring the Limits of Language Modeling , 2016, ArXiv.

[18]  Rosa Elena Vega-Moreno Representing and processing idioms , 2002 .

[19]  Q. Mcnemar Note on the sampling error of the difference between correlated proportions or percentages , 1947, Psychometrika.

[20]  Mohammed J. Zaki Data Mining and Analysis: Fundamental Concepts and Algorithms , 2014 .

[21]  Mark Davis,et al.  Tags for Identifying Languages , 2009, RFC.

[22]  P. Tabossi,et al.  The comprehension of idioms. , 1988 .

[23]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[24]  Hermann Ney,et al.  Error Analysis of Statistical Machine Translation Output , 2006, LREC.

[25]  Philipp Koehn,et al.  Findings of the 2015 Workshop on Statistical Machine Translation , 2015, WMT@EMNLP.

[26]  Zoubin Ghahramani,et al.  A Theoretically Grounded Application of Dropout in Recurrent Neural Networks , 2015, NIPS.

[27]  Tim Rocktäschel,et al.  Frustratingly Short Attention Spans in Neural Language Modeling , 2017, ICLR.

[28]  Yoshua Bengio,et al.  On Using Very Large Target Vocabulary for Neural Machine Translation , 2014, ACL.

[29]  John Cocke,et al.  A Statistical Approach to Machine Translation , 1990, CL.

[30]  Aline Villavicencio,et al.  Lexical Encoding of MWEs , 2004 .

[31]  Miloslav Konopík,et al.  Semantic spaces for improving language modeling , 2014, Comput. Speech Lang..

[32]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[33]  Suzanne Stevenson,et al.  The VNC-Tokens Dataset , 2008 .

[34]  Haizhou Li,et al.  Learning Translation Boundaries for Phrase-Based Decoding , 2010, NAACL.

[35]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[36]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[37]  Dekang Lin,et al.  Automatic Retrieval and Clustering of Similar Words , 1998, ACL.

[38]  Carlos Ramisch,et al.  Web-based and combined language models: a case study on noun compound identification , 2010, COLING.

[39]  Alessandro Lenci,et al.  Lexical Variability and Compositionality: Investigating Idiomaticity with Distributional Semantic Models , 2016, MWE@ACL.

[40]  Philip Koehn,et al.  Statistical Machine Translation , 2010, EAMT.

[41]  Timothy Baldwin,et al.  Multiword Expressions: A Pain in the Neck for NLP , 2002, CICLing.

[42]  Ton Dijkstra,et al.  Context-dependent Semantic Processing in the Human Brain: Evidence from Idiom Comprehension , 2013, Journal of Cognitive Neuroscience.

[43]  Philipp Koehn,et al.  Six Challenges for Neural Machine Translation , 2017, NMT@ACL.

[44]  Eiichiro Sumita,et al.  Introducing translation dictionary into phrase-based SMT , 2008, MTSUMMIT.

[45]  Ann Bies,et al.  The Penn Treebank: Annotating Predicate Argument Structure , 1994, HLT.

[46]  John D. Kelleher,et al.  Evaluation of a Substitution Method for Idiom Transformation in Statistical Machine Translation , 2014, MWE@EACL.

[47]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[48]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[49]  A. P. B. Sardinha Corpus linguistics - investigating language structure and use , 1999 .

[50]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[51]  Nicolas Usunier,et al.  Improving Neural Language Models with a Continuous Cache , 2016, ICLR.

[52]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[53]  Jing Peng,et al.  Automatic Detection of Idiomatic Clauses , 2013, CICLing.

[54]  John D. Kelleher,et al.  An Empirical Study of the Impact of Idioms on Phrase Based Statistical Machine Translation of English to Brazilian-Portuguese , 2014, HyTra@EACL.

[55]  Qun Liu,et al.  Improving Statistical Machine Translation Using Domain Bilingual Multiword Expressions , 2009, MWE@IJCNLP.

[56]  Sanja Fidler,et al.  Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[57]  Philipp Koehn,et al.  Towards better Machine Translation Quality for the German-English Language Pairs , 2008, WMT@ACL.

[58]  Valia Kordoni,et al.  Multiword Expressions in Machine Translation , 2014, LREC.

[59]  Frank Keller,et al.  Using the Web to Obtain Frequencies for Unseen Bigrams , 2003, CL.

[60]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[61]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[62]  L. Burnard The British National Corpus , 1998 .

[63]  Christopher D. Manning,et al.  Stanford Neural Machine Translation Systems for Spoken Language Domains , 2015, IWSLT.

[64]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[65]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[66]  Caroline Sporleder,et al.  Linguistic Cues for Distinguishing Literal and Non-Literal Usages , 2010, COLING.

[67]  John D. Kelleher,et al.  Idiom Token Classification using Sentential Distributed Semantics , 2016, ACL.

[68]  Philipp Koehn,et al.  Statistical Significance Tests for Machine Translation Evaluation , 2004, EMNLP.

[69]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[70]  Keh-Jiann Chen,et al.  Acquiring Translation Equivalences of Multiword Expressions by Normalized Correlation Frequencies , 2009, EMNLP.

[71]  Björn Gambäck,et al.  Improving Word Translation Disambiguation by Capturing Multiword Expressions with Dictionaries , 2013, MWE@NAACL-HLT.

[72]  Afsaneh Fazly,et al.  Unsupervised Type and Token Identification of Idiomatic Expressions , 2009, CL.

[73]  Aline Villavicencio,et al.  Introduction to the special issue on multiword expressions: Having a crack at a hard nut , 2005, Comput. Speech Lang..

[74]  Pierre Zweigenbaum,et al.  Improved Statistical Machine Translation Using MultiWord Expressions , 2011 .

[75]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[76]  Kenneth Ward Church,et al.  Using Statistics in Lexical Analysis , 2003, Lexical Acquisition: Exploiting On-Line Resources to Build a Lexicon.

[77]  Mirella Lapata,et al.  Long Short-Term Memory-Networks for Machine Reading , 2016, EMNLP.

[78]  Quoc V. Le,et al.  Addressing the Rare Word Problem in Neural Machine Translation , 2014, ACL.

[79]  Daisuke Kawahara,et al.  Construction of an Idiom Corpus and its Application to Idiom Identification based on WSD Incorporating Idiom-Specific Features , 2008, EMNLP.

[80]  Afsaneh Fazly,et al.  Automatically Constructing a Lexicon of Verb Phrase Idiomatic Combinations , 2006, EACL.

[81]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[82]  Hermann Ney,et al.  Discriminative Training and Maximum Entropy Models for Statistical Machine Translation , 2002, ACL.

[83]  Daniel Gildea,et al.  Bayesian Learning of Non-Compositional Phrases with Synchronous Parsing , 2008, ACL.

[84]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[85]  Ekaterina Vylomova,et al.  Classifying Idiomatic and Literal Expressions Using Topic Models and Intensity of Emotions , 2014, EMNLP.

[86]  Yoshua Bengio,et al.  Montreal Neural Machine Translation Systems for WMT’15 , 2015, WMT@EMNLP.

[87]  Holger Schwenk,et al.  Large, Pruned or Continuous Space Language Models on a GPU for Statistical Machine Translation , 2012, WLM@NAACL-HLT.

[88]  Dietrich Klakow,et al.  Long-Short Range Context Neural Networks for Language Modeling , 2016, EMNLP.

[89]  John D. Kelleher,et al.  Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, Worked Examples, and Case Studies , 2015 .

[90]  Andy Way,et al.  SUMAT: Data Collection and Parallel Corpus Compilation for Machine Translation of Subtitles , 2012, LREC.

[91]  Caroline Sporleder,et al.  Unsupervised Recognition of Literal and Non-Literal Use of Idiomatic Expressions , 2009, EACL.

[92]  Zhiguo Wang,et al.  Coverage Embedding Models for Neural Machine Translation , 2016, EMNLP.

[93]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[94]  Caroline Sporleder,et al.  Idioms in Context: The IDIX Corpus , 2010, LREC.

[95]  C. Spearman,et al.  Demonstration of Formulae for True Measurement of Correlation , 1907 .

[96]  Andy Way,et al.  Statistical Machine Translation: A Guide for Linguists and Translators , 2011, Lang. Linguistics Compass.

[97]  Lior Wolf,et al.  Using the Output Embedding to Improve Language Models , 2016, EACL.

[98]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[99]  Markus Freitag,et al.  Fast Domain Adaptation for Neural Machine Translation , 2016, ArXiv.

[100]  John D. Kelleher,et al.  Attentive Language Models , 2017, IJCNLP.

[101]  Shan Wang,et al.  Identifying Idioms in Chinese Translations , 2014, LREC.

[102]  Susanne Z. Riehemann,et al.  A constructional approach to idioms and word formation , 2001 .

[103]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[104]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[105]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[106]  Pierre Zweigenbaum,et al.  Identifying bilingual Multi-Word Expressions for Statistical Machine Translation , 2012, LREC.

[107]  Colin Bannard A Measure of Syntactic Flexibility for Automatically Identifying Multiword Expressions in Corpora , 2007 .

[108]  Sanja Fidler,et al.  Skip-Thought Vectors , 2015, NIPS.

[109]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[110]  Rico Sennrich,et al.  Edinburgh Neural Machine Translation Systems for WMT 16 , 2016, WMT.

[111]  John D. Kelleher,et al.  Idiom Type Identification with Smoothed Lexical Features and a Maximum Margin Classifier , 2017, RANLP.

[112]  Phil Blunsom,et al.  A Convolutional Neural Network for Modelling Sentences , 2014, ACL.

[113]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[114]  Ulrich Heid,et al.  A Survey of Idiomatic Preposition-Noun-Verb Triples on Token Level , 2010, LREC.

[115]  Vasiliki Foufi,et al.  Parsing and MWE Detection: Fips at the PARSEME Shared Task , 2017, MWE@EACL.

[116]  Carlos Ramisch,et al.  Never-Ending Multiword Expressions Learning , 2015, MWE@NAACL-HLT.

[117]  Caroline Sporleder,et al.  Using Gaussian Mixture Models to Detect Figurative Language in Context , 2010, NAACL.

[118]  Wojciech Zaremba,et al.  Recurrent Neural Network Regularization , 2014, ArXiv.

[119]  Lucia Specia,et al.  Fully Automatic Compilation of Portuguese-English and Portuguese-Spanish Parallel Corpora , 2011, STIL.

[120]  Luke S. Zettlemoyer,et al.  Automatic Idiom Identification in Wiktionary , 2013, EMNLP.

[121]  Christof Monz,et al.  Recurrent Memory Network for Language Modeling , 2016, ArXiv.

[122]  Tsuyoshi Okita,et al.  Word alignment and smoothing methods in statistical machine translation: Noise, prior knowledge and overfitting , 2012 .