论文信息 - On internal language representations in deep learning: an analysis of machine translation and speech recognition

On internal language representations in deep learning: an analysis of machine translation and speech recognition

Language technology has become pervasive in everyday life. Neural networks are a key component in this technology thanks to their ability to model large amounts of data. Contrary to traditional systems, models based on deep neural networks (a.k.a. deep learning) can be trained in an end-to-end fashion on input-output pairs, such as a sentence in one language and its translation in another language, or a speech utterance and its transcription. The end-to-end training paradigm simplifies the engineering process while giving the model flexibility to optimize for the desired task. This, however, often comes at the expense of model interpretability: understanding the role of different parts of the deep neural network is difficult, and such models are sometimes perceived as “black-box”, hindering research efforts and limiting their utility to society. This thesis investigates what kind of linguistic information is represented in deep learning models for written and spoken language. In order to study this question, I develop a unified methodology for evaluating internal representations in neural networks, consisting of three steps: training a model on a complex end-to-end task; generating feature representations from different parts of the trained model; and training classifiers on simple supervised learning tasks using the representations. I demonstrate the approach on two core tasks in human language technology: machine translation and speech recognition. I perform a battery of experiments comparing different layers, modules, and architectures in end-to-end models that are trained on these tasks, and evaluate their quality at different linguistic levels. First, I study how neural machine translation models learn morphological information. Second, I compare lexical semantic and part-of-speech information in neural machine translation. Third, I investigate where syntactic and semantic structures are captured

Yonatan Belinkov | Yonatan Belinkov

[1] Christopher D. Manning,et al. Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[2] James R. Glass,et al. Segmentation for English-to-Arabic Statistical Machine Translation , 2008, ACL.

[3] Liang Lu,et al. Multitask Learning with Low-Level Auxiliary Tasks for Encoder-Decoder Based Speech Recognition , 2017, INTERSPEECH.

[4] Tim Miller,et al. Explanation in Artificial Intelligence: Insights from the Social Sciences , 2017, Artif. Intell..

[5] Klaus-Robert Müller,et al. "What is relevant in a text document?": An interpretable machine learning approach , 2016, PloS one.

[6] Hung-yi Lee,et al. Towards Unsupervised Automatic Speech Recognition Trained by Unaligned Speech and Text only , 2018, ArXiv.

[7] Danqi Chen,et al. A Fast and Accurate Dependency Parser using Neural Networks , 2014, EMNLP.

[8] Holger Schwenk,et al. Continuous Space Translation Models for Phrase-Based Statistical Machine Translation , 2012, COLING.

[9] Zhizheng Wu,et al. Investigating gated recurrent neural networks for speech synthesis , 2016, ArXiv.

[10] Laura Mascarell,et al. Improving Word Sense Disambiguation in Neural Machine Translation with Sense Embeddings , 2017, WMT.

[11] Rico Sennrich,et al. How Grammatical is Character-level Neural Machine Translation? Assessing MT Quality with Contrastive Translation Pairs , 2016, EACL.

[12] Roberto Navigli,et al. EuroSense: Automatic Harvesting of Multilingual Sense Annotations from Parallel Text , 2017, ACL.

[13] Bowen Zhou,et al. Classifying Relations by Ranking with Convolutional Neural Networks , 2015, ACL.

[14] Ananthram Swami,et al. Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.

[15] Tara N. Sainath,et al. Convolutional, Long Short-Term Memory, fully connected Deep Neural Networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[16] Robert L. Mercer,et al. Class-Based n-gram Models of Natural Language , 1992, CL.

[17] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[18] Helmut Schmid,et al. LoPar: Design and Implementation , 2000 .

[19] Mark Fishel,et al. Linguistically Motivated Unsupervised Segmentation for Machine Translation , 2010, LREC.

[20] Marc Dymetman,et al. Using Syntactic Coupling Features for Discriminating Phrase-Based Translations (WMT-08 Shared Translation Task) , 2008, WMT@ACL.

[21] Philipp Koehn,et al. Findings of the 2017 Conference on Machine Translation (WMT17) , 2017, WMT.

[22] Alex Waibel,et al. Connectionist F-structure transfer , 1997 .

[23] Philipp Koehn,et al. Empirical Methods for Compound Splitting , 2003, EACL.

[24] Graham Neubig,et al. Travatar: A Forest-to-String Machine Translation Engine based on Tree Transducers , 2013, ACL.

[25] Xing Shi,et al. Does String-Based Neural MT Learn Source Syntax? , 2016, EMNLP.

[26] Katsuhito Sudoh,et al. Neural Reordering Model Considering Phrase Translation and Word Alignment for Phrase-based Translation , 2016, WAT@COLING.

[27] Martin Wattenberg,et al. Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation , 2016, TACL.

[28] James R. Glass,et al. Deep multimodal semantic embeddings for speech and images , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[29] Yonatan Belinkov,et al. Analysis of sentence embedding models using prediction tasks in natural language processing , 2017, IBM J. Res. Dev..

[30] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[31] Joseph Olive,et al. Handbook of Natural Language Processing and Machine Translation: DARPA Global Autonomous Language Exploitation , 2011 .

[32] Jason Weston,et al. A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[33] Yajie Miao,et al. EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[34] Sadaoki Furui,et al. History and Development of Speech Recognition , 2010 .

[35] Jacob Andreas,et al. Semantics-Based Machine Translation with Hyperedge Replacement Grammars , 2012, COLING.

[36] Vasudeva Varma,et al. Interpretation of Semantic Tweet Representations , 2017, 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[37] Ying Zhang,et al. Batch normalized recurrent neural networks , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[38] Mário J. Silva,et al. TUGAS: Exploiting unlabelled data for Twitter sentiment analysis , 2014, *SEMEVAL.

[39] Yonghui Wu,et al. Exploring the Limits of Language Modeling , 2016, ArXiv.

[40] Yang Liu,et al. Tree-to-String Alignment Template for Statistical Machine Translation , 2006, ACL.

[41] Daniel Jurafsky,et al. Understanding Neural Networks through Representation Erasure , 2016, ArXiv.

[42] Eduard H. Hovy,et al. A Model of Coherence Based on Distributed Sentence Representation , 2014, EMNLP.

[43] Daniel Marcu,et al. What’s in a translation rule? , 2004, NAACL.

[44] Alexander M. Fraser,et al. Producing Unseen Morphological Variants in Statistical Machine Translation , 2017, EACL.

[45] Daniel Marcu,et al. Statistical Phrase-Based Translation , 2003, NAACL.

[46] Yonatan Belinkov,et al. Language processing and learning models for community question answering in Arabic , 2017, Inf. Process. Manag..

[47] Yonatan Belinkov,et al. Evaluating Layers of Representation in Neural Machine Translation on Part-of-Speech and Semantic Tagging Tasks , 2017, IJCNLP.

[48] Mathias Creutz,et al. Morphology-aware statistical machine translation based on morphs induced in an unsupervised manner , 2007, MTSUMMIT.

[49] Alexander Binder,et al. Layer-Wise Relevance Propagation for Deep Neural Network Architectures , 2016 .

[50] Tasha Nagamine,et al. On the Role of Nonlinear Transformations in Deep Neural Network Acoustic Models , 2016, INTERSPEECH.

[51] Baobao Chang,et al. Max-Margin Tensor Neural Network for Chinese Word Segmentation , 2014, ACL.

[52] Mikel L. Forcada,et al. Recursive Hetero-associative Memories for Translation , 1997, IWANN.

[53] Navdeep Jaitly,et al. Towards End-To-End Speech Recognition with Recurrent Neural Networks , 2014, ICML.

[54] Yonatan Belinkov,et al. What do Neural Machine Translation Models Learn about Morphology? , 2017, ACL.

[55] J.A.U. Paul,et al. In other words. A coursebook on translation [Review of: M. Bakker (1994) -] , 1994 .

[56] J. Elman. Representation and structure in connectionist models , 1991 .

[57] N. Koncar. A Natural Language Translation Neural Network , 1994 .

[58] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[59] Marco Baroni,et al. Generalization without Systematicity: On the Compositional Skills of Sequence-to-Sequence Recurrent Networks , 2017, ICML.

[60] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[61] Allyson Ettinger,et al. Probing for semantic evidence of composition by means of simple classification tasks , 2016, RepEval@ACL.

[62] Hung-yi Lee,et al. Gate Activation Signal Analysis for Gated Recurrent Neural Networks and its Correlation with Phoneme Boundaries , 2017, INTERSPEECH.

[63] Marine Carpuat,et al. Improving Statistical Machine Translation Using Word Sense Disambiguation , 2007, EMNLP.

[64] Adrian Weller,et al. Challenges for Transparency , 2017, ArXiv.

[65] Fei-Fei Li,et al. Visualizing and Understanding Recurrent Networks , 2015, ArXiv.

[66] Ananthram Swami,et al. Crafting adversarial input sequences for recurrent neural networks , 2016, MILCOM 2016 - 2016 IEEE Military Communications Conference.

[67] Jinyu Li,et al. Feature Learning in Deep Neural Networks - Studies on Speech Recognition Tasks. , 2013, ICLR 2013.

[68] Hermann Ney,et al. The Alignment Template Approach to Statistical Machine Translation , 2004, CL.

[69] John DeNero,et al. A Class-Based Agreement Model for Generating Accurately Inflected Translations , 2012, ACL.

[70] Christian Poellabauer,et al. An Overview of Vulnerabilities of Voice Controlled Systems , 2018, ArXiv.

[71] Philipp Koehn,et al. CCG Supertags in Factored Statistical Machine Translation , 2007, WMT@ACL.

[72] Philipp Koehn,et al. Factored Translation Models , 2007, EMNLP.

[73] Devdatt P. Dubhashi,et al. Extractive Summarization using Continuous Vector Space Models , 2014, CVSC@EACL.

[74] Frederick Liu,et al. Handling Homographs in Neural Machine Translation , 2017, NAACL.

[75] Quanshi Zhang,et al. Visual interpretability for deep learning: a survey , 2018, Frontiers of Information Technology & Electronic Engineering.

[76] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[77] Nicholas Noinaj,et al. Summary and Future Directions , 2018, Carbon Nanotubes.

[78] Rico Sennrich,et al. Linguistic Input Features Improve Neural Machine Translation , 2016, WMT.

[79] Tara N. Sainath,et al. State-of-the-Art Speech Recognition with Sequence-to-Sequence Models , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[80] Yoshua Bengio,et al. On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[81] Alex Waibel,et al. JANUS: a speech-to-speech translation system using connectionist and symbolic processing strategies , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[82] Geoffrey E. Hinton,et al. Understanding how Deep Belief Networks perform acoustic modelling , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[83] Josef van Genabith,et al. How Robust Are Character-Based Word Embeddings in Tagging and MT Against Wrod Scramlbing or Randdm Nouse? , 2017, AMTA.

[84] W. Bruce Croft,et al. Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , 2013 .

[85] Navdeep Jaitly,et al. Hybrid speech recognition with Deep Bidirectional LSTM , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[86] Johan Bos,et al. Semantic Tagging with Deep Residual Networks , 2016, COLING.

[87] Maosong Sun,et al. A Neural Reordering Model for Phrase-based Translation , 2014, COLING.

[88] Xuanjing Huang,et al. Investigating Language Universal and Specific Properties in Word Embeddings , 2016, ACL.

[89] Zhe Gan,et al. Learning Generic Sentence Representations Using Convolutional Neural Networks , 2016, EMNLP.

[90] Wu Chou. Minimum Classification Error (MCE) Approach in Pattern Recognition , 2003 .

[91] Xiaodong He,et al. Character-Level Question Answering with Attention , 2016, EMNLP.

[92] Holger Schwenk,et al. Continuous Space Language Models for Statistical Machine Translation , 2006, ACL.

[93] Emmanuel Dupoux,et al. Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies , 2016, TACL.

[94] Grzegorz Chrupala,et al. Representation of Linguistic Form and Function in Recurrent Neural Networks , 2016, CL.

[95] Geoffrey E. Hinton,et al. Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[96] Stephan Oepen,et al. SemEval 2014 Task 8: Broad-Coverage Semantic Dependency Parsing , 2014, *SEMEVAL.

[97] Kevin Knight,et al. A Syntax-based Statistical Translation Model , 2001, ACL.

[98] Johan Bos,et al. The Parallel Meaning Bank: Towards a Multilingual Corpus of Translations Annotated with Compositional Meaning Representations , 2017, EACL.

[99] Yoon Kim,et al. Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[100] Kevin Duh,et al. On the Elements of an Accurate Tree-to-String Machine Translation System , 2014, ACL.

[101] Yuji Matsumoto,et al. Phrase reordering for statistical machine translation based on predicate-argument structure , 2006, IWSLT.

[102] Noah A. Smith,et al. What Do Recurrent Neural Network Grammars Learn About Syntax? , 2016, EACL.

[103] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[104] James H. Martin,et al. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[105] Carla Teixeira Lopes,et al. TIMIT Acoustic-Phonetic Continuous Speech Corpus , 2012 .

[106] Stuart M. Shieber,et al. Synchronous Tree-Adjoining Grammars , 1990, COLING.

[107] Xiaojin Zhu,et al. Using Machine Teaching to Identify Optimal Training-Set Attacks on Machine Learners , 2015, AAAI.

[108] Tara N. Sainath,et al. Acoustic modelling with CD-CTC-SMBR LSTM RNNS , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[109] Yanjun Qi,et al. Black-Box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers , 2018, 2018 IEEE Security and Privacy Workshops (SPW).

[110] Joan Bruna,et al. Intriguing properties of neural networks , 2013, ICLR.

[111] Geoffrey E. Hinton,et al. Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[112] Yu Zhang,et al. Exploring neural network architectures for acoustic modeling , 2017 .

[113] Victor H. Yngve. A framework for syntactic translation , 1957, Mech. Transl. Comput. Linguistics.

[114] G. Heigold,et al. A Linguistic Evaluation of Rule-Based, Phrase-Based, and Neural MT Engines , 2017, Prague Bull. Math. Linguistics.

[115] Warren J. Plath. Early years in machine translation: memoirs and biographies of pioneers , 2002, Computational Linguistics.

[116] Rico Sennrich,et al. Evaluating Discourse Phenomena in Neural Machine Translation , 2017, NAACL.

[117] Grzegorz Chrupala,et al. On the difficulty of a distributional semantics of spoken language , 2018, ArXiv.

[118] Yann Dauphin,et al. Convolutional Sequence to Sequence Learning , 2017, ICML.

[119] Fernando Pereira,et al. Multilingual Dependency Analysis with a Two-Stage Discriminative Parser , 2006, CoNLL.

[120] Shuai Wang,et al. What Does the Speaker Embedding Encode? , 2017, INTERSPEECH.