Transduction Recursive Auto-Associative Memory: Learning Bilingual Compositional Distributed Vector Representations of Inversion Transduction Grammars

We introduce TRAAM, or Transduction RAAM, a fully bilingual generalization of Pollack’s (1990) monolingual Recursive Auto-Associative Memory neural network model, in which each distributed vector represents a bilingual constituent—i.e., an instance of a transduction rule, which specifies a relation between two monolingual constituents and how their subconstituents should be permuted. Bilingual terminals are special cases of bilingual constituents, where a vector represents either (1) a bilingual token —a token-totoken or “word-to-word” translation rule —or (2) a bilingual segment—a segmentto-segment or “phrase-to-phrase” translation rule. TRAAMs have properties that appear attractive for bilingual grammar induction and statistical machine translation applications. Training of TRAAM drives both the autoencoder weights and the vector representations to evolve, such that similar bilingual constituents tend to have more similar vectors.

[1]  Lonnie Chrisman,et al.  Learning Recursive Distributed Representations for Holistic Computation , 1991 .

[2]  Jianfeng Gao,et al.  Learning Continuous Phrase Representations for Translation Modeling , 2014, ACL.

[3]  Hinrich Schütze,et al.  Cutting Recursive Autoencoder Trees , 2013, ICLR.

[4]  Phil Blunsom,et al.  Multilingual Models for Compositional Distributed Semantics , 2014, ACL.

[5]  Holger Schwenk,et al.  Continuous Space Language Models for Statistical Machine Translation , 2006, ACL.

[6]  Joakim Nivre,et al.  Learning Stochastic Bracketing Inversion Transduction Grammars with a Cubic Time Biparsing Algorithm , 2009, IWPT.

[7]  Christopher D. Manning,et al.  Bilingual Word Embeddings for Phrase-Based Machine Translation , 2013, EMNLP.

[8]  Jing Zheng,et al.  An autoencoder with bilingual sparse features for improved statistical machine translation , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[10]  Jordan B. Pollack,et al.  Recursive Distributed Representations , 1990, Artif. Intell..

[11]  Ivan Titov,et al.  Inducing Crosslingual Distributed Representations of Words , 2012, COLING.

[12]  Marine Carpuat,et al.  Context-dependent phrasal translation lexicons for statistical machine translation , 2007, MTSUMMIT.

[13]  Richard M. Schwartz,et al.  Fast and Robust Neural Network Joint Models for Statistical Machine Translation , 2014, ACL.

[14]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[15]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[16]  Andreas Stolcke,et al.  Tree matching with recursive distributed representations , 1992, AAAI Conference on Artificial Intelligence.

[17]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[18]  Jeffrey Pennington,et al.  Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions , 2011, EMNLP.

[19]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[20]  Alexandre Allauzen,et al.  Continuous Space Translation Models with Neural Networks , 2012, NAACL.

[21]  Yang Liu,et al.  Recursive Autoencoders for ITG-Based Translation , 2013, EMNLP.

[22]  Dekai Wu,et al.  Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora , 1997, CL.

[23]  Phil Blunsom,et al.  Recurrent Continuous Translation Models , 2013, EMNLP.

[24]  Holger Schwenk,et al.  Continuous Space Translation Models for Phrase-Based Statistical Machine Translation , 2012, COLING.

[25]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[26]  Richard Rohwer,et al.  The "Moving Targets" Training Algorithm , 1989, NIPS.

[27]  Alfred V. Aho,et al.  The Theory of Parsing, Translation, and Compiling , 1972 .

[28]  Christoph Goller,et al.  Learning task-dependent distributed representations by backpropagation through structure , 1996, Proceedings of International Conference on Neural Networks (ICNN'96).

[29]  Honglak Lee,et al.  Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.