On internal language representations in deep learning: an analysis of machine translation and speech recognition

Language technology has become pervasive in everyday life. Neural networks are a key component in this technology thanks to their ability to model large amounts of data. Contrary to traditional systems, models based on deep neural networks (a.k.a. deep learning) can be trained in an end-to-end fashion on input-output pairs, such as a sentence in one language and its translation in another language, or a speech utterance and its transcription. The end-to-end training paradigm simplifies the engineering process while giving the model flexibility to optimize for the desired task. This, however, often comes at the expense of model interpretability: understanding the role of different parts of the deep neural network is difficult, and such models are sometimes perceived as “black-box”, hindering research efforts and limiting their utility to society. This thesis investigates what kind of linguistic information is represented in deep learning models for written and spoken language. In order to study this question, I develop a unified methodology for evaluating internal representations in neural networks, consisting of three steps: training a model on a complex end-to-end task; generating feature representations from different parts of the trained model; and training classifiers on simple supervised learning tasks using the representations. I demonstrate the approach on two core tasks in human language technology: machine translation and speech recognition. I perform a battery of experiments comparing different layers, modules, and architectures in end-to-end models that are trained on these tasks, and evaluate their quality at different linguistic levels. First, I study how neural machine translation models learn morphological information. Second, I compare lexical semantic and part-of-speech information in neural machine translation. Third, I investigate where syntactic and semantic structures are captured

[1]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[2]  James R. Glass,et al.  Segmentation for English-to-Arabic Statistical Machine Translation , 2008, ACL.

[3]  Liang Lu,et al.  Multitask Learning with Low-Level Auxiliary Tasks for Encoder-Decoder Based Speech Recognition , 2017, INTERSPEECH.

[4]  Tim Miller,et al.  Explanation in Artificial Intelligence: Insights from the Social Sciences , 2017, Artif. Intell..

[5]  Klaus-Robert Müller,et al.  "What is relevant in a text document?": An interpretable machine learning approach , 2016, PloS one.

[6]  Hung-yi Lee,et al.  Towards Unsupervised Automatic Speech Recognition Trained by Unaligned Speech and Text only , 2018, ArXiv.

[7]  Danqi Chen,et al.  A Fast and Accurate Dependency Parser using Neural Networks , 2014, EMNLP.

[8]  Holger Schwenk,et al.  Continuous Space Translation Models for Phrase-Based Statistical Machine Translation , 2012, COLING.

[9]  Zhizheng Wu,et al.  Investigating gated recurrent neural networks for speech synthesis , 2016, ArXiv.

[10]  Laura Mascarell,et al.  Improving Word Sense Disambiguation in Neural Machine Translation with Sense Embeddings , 2017, WMT.

[11]  Rico Sennrich,et al.  How Grammatical is Character-level Neural Machine Translation? Assessing MT Quality with Contrastive Translation Pairs , 2016, EACL.

[12]  Roberto Navigli,et al.  EuroSense: Automatic Harvesting of Multilingual Sense Annotations from Parallel Text , 2017, ACL.

[13]  Bowen Zhou,et al.  Classifying Relations by Ranking with Convolutional Neural Networks , 2015, ACL.

[14]  Ananthram Swami,et al.  Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.

[15]  Tara N. Sainath,et al.  Convolutional, Long Short-Term Memory, fully connected Deep Neural Networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[16]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[17]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[18]  Helmut Schmid,et al.  LoPar: Design and Implementation , 2000 .

[19]  Mark Fishel,et al.  Linguistically Motivated Unsupervised Segmentation for Machine Translation , 2010, LREC.

[20]  Marc Dymetman,et al.  Using Syntactic Coupling Features for Discriminating Phrase-Based Translations (WMT-08 Shared Translation Task) , 2008, WMT@ACL.

[21]  Philipp Koehn,et al.  Findings of the 2017 Conference on Machine Translation (WMT17) , 2017, WMT.

[22]  Alex Waibel,et al.  Connectionist F-structure transfer , 1997 .

[23]  Philipp Koehn,et al.  Empirical Methods for Compound Splitting , 2003, EACL.

[24]  Graham Neubig,et al.  Travatar: A Forest-to-String Machine Translation Engine based on Tree Transducers , 2013, ACL.

[25]  Xing Shi,et al.  Does String-Based Neural MT Learn Source Syntax? , 2016, EMNLP.

[26]  Katsuhito Sudoh,et al.  Neural Reordering Model Considering Phrase Translation and Word Alignment for Phrase-based Translation , 2016, WAT@COLING.

[27]  Martin Wattenberg,et al.  Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation , 2016, TACL.

[28]  James R. Glass,et al.  Deep multimodal semantic embeddings for speech and images , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[29]  Yonatan Belinkov,et al.  Analysis of sentence embedding models using prediction tasks in natural language processing , 2017, IBM J. Res. Dev..

[30]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[31]  Joseph Olive,et al.  Handbook of Natural Language Processing and Machine Translation: DARPA Global Autonomous Language Exploitation , 2011 .

[32]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[33]  Yajie Miao,et al.  EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[34]  Sadaoki Furui,et al.  History and Development of Speech Recognition , 2010 .

[35]  Jacob Andreas,et al.  Semantics-Based Machine Translation with Hyperedge Replacement Grammars , 2012, COLING.

[36]  Vasudeva Varma,et al.  Interpretation of Semantic Tweet Representations , 2017, 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[37]  Ying Zhang,et al.  Batch normalized recurrent neural networks , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[38]  Mário J. Silva,et al.  TUGAS: Exploiting unlabelled data for Twitter sentiment analysis , 2014, *SEMEVAL.

[39]  Yonghui Wu,et al.  Exploring the Limits of Language Modeling , 2016, ArXiv.

[40]  Yang Liu,et al.  Tree-to-String Alignment Template for Statistical Machine Translation , 2006, ACL.

[41]  Daniel Jurafsky,et al.  Understanding Neural Networks through Representation Erasure , 2016, ArXiv.

[42]  Eduard H. Hovy,et al.  A Model of Coherence Based on Distributed Sentence Representation , 2014, EMNLP.

[43]  Daniel Marcu,et al.  What’s in a translation rule? , 2004, NAACL.

[44]  Alexander M. Fraser,et al.  Producing Unseen Morphological Variants in Statistical Machine Translation , 2017, EACL.

[45]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[46]  Yonatan Belinkov,et al.  Language processing and learning models for community question answering in Arabic , 2017, Inf. Process. Manag..

[47]  Yonatan Belinkov,et al.  Evaluating Layers of Representation in Neural Machine Translation on Part-of-Speech and Semantic Tagging Tasks , 2017, IJCNLP.

[48]  Mathias Creutz,et al.  Morphology-aware statistical machine translation based on morphs induced in an unsupervised manner , 2007, MTSUMMIT.

[49]  Alexander Binder,et al.  Layer-Wise Relevance Propagation for Deep Neural Network Architectures , 2016 .

[50]  Tasha Nagamine,et al.  On the Role of Nonlinear Transformations in Deep Neural Network Acoustic Models , 2016, INTERSPEECH.

[51]  Baobao Chang,et al.  Max-Margin Tensor Neural Network for Chinese Word Segmentation , 2014, ACL.

[52]  Mikel L. Forcada,et al.  Recursive Hetero-associative Memories for Translation , 1997, IWANN.

[53]  Navdeep Jaitly,et al.  Towards End-To-End Speech Recognition with Recurrent Neural Networks , 2014, ICML.

[54]  Yonatan Belinkov,et al.  What do Neural Machine Translation Models Learn about Morphology? , 2017, ACL.

[55]  J.A.U. Paul,et al.  In other words. A coursebook on translation [Review of: M. Bakker (1994) -] , 1994 .

[56]  J. Elman Representation and structure in connectionist models , 1991 .

[57]  N. Koncar A Natural Language Translation Neural Network , 1994 .

[58]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[59]  Marco Baroni,et al.  Generalization without Systematicity: On the Compositional Skills of Sequence-to-Sequence Recurrent Networks , 2017, ICML.

[60]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[61]  Allyson Ettinger,et al.  Probing for semantic evidence of composition by means of simple classification tasks , 2016, RepEval@ACL.

[62]  Hung-yi Lee,et al.  Gate Activation Signal Analysis for Gated Recurrent Neural Networks and its Correlation with Phoneme Boundaries , 2017, INTERSPEECH.

[63]  Marine Carpuat,et al.  Improving Statistical Machine Translation Using Word Sense Disambiguation , 2007, EMNLP.

[64]  Adrian Weller,et al.  Challenges for Transparency , 2017, ArXiv.

[65]  Fei-Fei Li,et al.  Visualizing and Understanding Recurrent Networks , 2015, ArXiv.

[66]  Ananthram Swami,et al.  Crafting adversarial input sequences for recurrent neural networks , 2016, MILCOM 2016 - 2016 IEEE Military Communications Conference.

[67]  Jinyu Li,et al.  Feature Learning in Deep Neural Networks - Studies on Speech Recognition Tasks. , 2013, ICLR 2013.

[68]  Hermann Ney,et al.  The Alignment Template Approach to Statistical Machine Translation , 2004, CL.

[69]  John DeNero,et al.  A Class-Based Agreement Model for Generating Accurately Inflected Translations , 2012, ACL.

[70]  Christian Poellabauer,et al.  An Overview of Vulnerabilities of Voice Controlled Systems , 2018, ArXiv.

[71]  Philipp Koehn,et al.  CCG Supertags in Factored Statistical Machine Translation , 2007, WMT@ACL.

[72]  Philipp Koehn,et al.  Factored Translation Models , 2007, EMNLP.

[73]  Devdatt P. Dubhashi,et al.  Extractive Summarization using Continuous Vector Space Models , 2014, CVSC@EACL.

[74]  Frederick Liu,et al.  Handling Homographs in Neural Machine Translation , 2017, NAACL.

[75]  Quanshi Zhang,et al.  Visual interpretability for deep learning: a survey , 2018, Frontiers of Information Technology & Electronic Engineering.

[76]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[77]  Nicholas Noinaj,et al.  Summary and Future Directions , 2018, Carbon Nanotubes.

[78]  Rico Sennrich,et al.  Linguistic Input Features Improve Neural Machine Translation , 2016, WMT.

[79]  Tara N. Sainath,et al.  State-of-the-Art Speech Recognition with Sequence-to-Sequence Models , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[80]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[81]  Alex Waibel,et al.  JANUS: a speech-to-speech translation system using connectionist and symbolic processing strategies , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[82]  Geoffrey E. Hinton,et al.  Understanding how Deep Belief Networks perform acoustic modelling , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[83]  Josef van Genabith,et al.  How Robust Are Character-Based Word Embeddings in Tagging and MT Against Wrod Scramlbing or Randdm Nouse? , 2017, AMTA.

[84]  W. Bruce Croft,et al.  Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , 2013 .

[85]  Navdeep Jaitly,et al.  Hybrid speech recognition with Deep Bidirectional LSTM , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[86]  Johan Bos,et al.  Semantic Tagging with Deep Residual Networks , 2016, COLING.

[87]  Maosong Sun,et al.  A Neural Reordering Model for Phrase-based Translation , 2014, COLING.

[88]  Xuanjing Huang,et al.  Investigating Language Universal and Specific Properties in Word Embeddings , 2016, ACL.

[89]  Zhe Gan,et al.  Learning Generic Sentence Representations Using Convolutional Neural Networks , 2016, EMNLP.

[90]  Wu Chou Minimum Classification Error (MCE) Approach in Pattern Recognition , 2003 .

[91]  Xiaodong He,et al.  Character-Level Question Answering with Attention , 2016, EMNLP.

[92]  Holger Schwenk,et al.  Continuous Space Language Models for Statistical Machine Translation , 2006, ACL.

[93]  Emmanuel Dupoux,et al.  Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies , 2016, TACL.

[94]  Grzegorz Chrupala,et al.  Representation of Linguistic Form and Function in Recurrent Neural Networks , 2016, CL.

[95]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[96]  Stephan Oepen,et al.  SemEval 2014 Task 8: Broad-Coverage Semantic Dependency Parsing , 2014, *SEMEVAL.

[97]  Kevin Knight,et al.  A Syntax-based Statistical Translation Model , 2001, ACL.

[98]  Johan Bos,et al.  The Parallel Meaning Bank: Towards a Multilingual Corpus of Translations Annotated with Compositional Meaning Representations , 2017, EACL.

[99]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[100]  Kevin Duh,et al.  On the Elements of an Accurate Tree-to-String Machine Translation System , 2014, ACL.

[101]  Yuji Matsumoto,et al.  Phrase reordering for statistical machine translation based on predicate-argument structure , 2006, IWSLT.

[102]  Noah A. Smith,et al.  What Do Recurrent Neural Network Grammars Learn About Syntax? , 2016, EACL.

[103]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[104]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[105]  Carla Teixeira Lopes,et al.  TIMIT Acoustic-Phonetic Continuous Speech Corpus , 2012 .

[106]  Stuart M. Shieber,et al.  Synchronous Tree-Adjoining Grammars , 1990, COLING.

[107]  Xiaojin Zhu,et al.  Using Machine Teaching to Identify Optimal Training-Set Attacks on Machine Learners , 2015, AAAI.

[108]  Tara N. Sainath,et al.  Acoustic modelling with CD-CTC-SMBR LSTM RNNS , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[109]  Yanjun Qi,et al.  Black-Box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers , 2018, 2018 IEEE Security and Privacy Workshops (SPW).

[110]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[111]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[112]  Yu Zhang,et al.  Exploring neural network architectures for acoustic modeling , 2017 .

[113]  Victor H. Yngve A framework for syntactic translation , 1957, Mech. Transl. Comput. Linguistics.

[114]  G. Heigold,et al.  A Linguistic Evaluation of Rule-Based, Phrase-Based, and Neural MT Engines , 2017, Prague Bull. Math. Linguistics.

[115]  Warren J. Plath Early years in machine translation: memoirs and biographies of pioneers , 2002, Computational Linguistics.

[116]  Rico Sennrich,et al.  Evaluating Discourse Phenomena in Neural Machine Translation , 2017, NAACL.

[117]  Grzegorz Chrupala,et al.  On the difficulty of a distributional semantics of spoken language , 2018, ArXiv.

[118]  Yann Dauphin,et al.  Convolutional Sequence to Sequence Learning , 2017, ICML.

[119]  Fernando Pereira,et al.  Multilingual Dependency Analysis with a Two-Stage Discriminative Parser , 2006, CoNLL.

[120]  Shuai Wang,et al.  What Does the Speaker Embedding Encode? , 2017, INTERSPEECH.

[121]  Christine D. Piatko,et al.  Using “Annotator Rationales” to Improve Machine Learning for Text Categorization , 2007, NAACL.

[122]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[123]  Rico Sennrich,et al.  Deep architectures for Neural Machine Translation , 2017, WMT.

[124]  Yuan Ding,et al.  Machine Translation Using Probabilistic Synchronous Dependency Insertion Grammars , 2005, ACL.

[125]  John Cocke,et al.  A Statistical Approach to Machine Translation , 1990, CL.

[126]  Jürgen Schmidhuber,et al.  Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.

[127]  Joakim Nivre,et al.  Analyzing and Integrating Dependency Parsers , 2011, CL.

[128]  Yann Dauphin,et al.  A Convolutional Encoder Model for Neural Machine Translation , 2016, ACL.

[129]  Jason Eisner,et al.  Learning Non-Isomorphic Tree Mappings for Machine Translation , 2003, ACL.

[130]  Nadir Durrani,et al.  Hindi-to-Urdu Machine Translation through Transliteration , 2010, ACL.

[131]  Noah A. Smith,et al.  Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , 2016, ACL 2016.

[132]  Dawn Xiaodong Song,et al.  Delving into Transferable Adversarial Examples and Black-box Attacks , 2016, ICLR.

[133]  Christopher Joseph Pal,et al.  Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning , 2018, ICLR.

[134]  D. R. Reddy An approach to computer speech recognition by direct analysis of the speech wave , 1966 .

[135]  Chris Quirk,et al.  Dependency Treelet Translation: Syntactically Informed Phrasal SMT , 2005, ACL.

[136]  S. T. Dumais,et al.  Using latent semantic analysis to improve access to textual information , 1988, CHI '88.

[137]  Christopher D. Manning,et al.  Optimizing Chinese Word Segmentation for Machine Translation Performance , 2008, WMT@ACL.

[138]  Yonatan Belinkov,et al.  Fine-grained Analysis of Sentence Embeddings Using Auxiliary Prediction Tasks , 2016, ICLR.

[139]  Daniel Marcu,et al.  Scalable Inference and Training of Context-Rich Syntactic Translation Models , 2006, ACL.

[140]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[141]  Patrick D. McDaniel,et al.  Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples , 2016, ArXiv.

[142]  Christopher D. Manning,et al.  Learning Continuous Phrase Representations and Syntactic Parsing with Recursive Neural Networks , 2010 .

[143]  Pierre Isabelle,et al.  A Challenge Set Approach to Evaluating Machine Translation , 2017, EMNLP.

[144]  I. McLean Example-based machine translation using connectionist matching , 1992, TMI.

[145]  Xinlei Chen,et al.  Visualizing and Understanding Neural Models in NLP , 2015, NAACL.

[146]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[147]  Nadir Durrani,et al.  Investigating the Usefulness of Generalized Word Representations in SMT , 2014, COLING.

[148]  Kristina Toutanova,et al.  Generating Complex Morphology for Machine Translation , 2007, ACL.

[149]  Felix Hill,et al.  Learning Distributed Representations of Sentences from Unlabelled Data , 2016, NAACL.

[150]  François Yvon,et al.  Evaluating the morphological competence of Machine Translation Systems , 2017, WMT.

[151]  Margaret King,et al.  Using Test Suites in Evaluation of Machine Translation Systems , 1990, COLING.

[152]  Ekaterina Vylomova,et al.  Word Representation Models for Morphologically Rich Languages in Neural Machine Translation , 2016, SWCN@EMNLP.

[153]  Hwee Tou Ng,et al.  Word Sense Disambiguation Improves Statistical Machine Translation , 2007, ACL.

[154]  Yu Zhang,et al.  Very deep convolutional networks for end-to-end speech recognition , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[155]  Tommi S. Jaakkola,et al.  A causal framework for explaining the predictions of black-box sequence-to-sequence models , 2017, EMNLP.

[156]  Preslav Nakov,et al.  Combining Word-Level and Character-Level Models for Machine Translation Between Closely-Related Languages , 2012, ACL.

[157]  Grzegorz Chrupala,et al.  Representations of language in a model of visually grounded speech signal , 2017, ACL.

[158]  David Weinberger,et al.  Accountability of AI Under the Law: The Role of Explanation , 2017, ArXiv.

[159]  Daniel Jurafsky,et al.  Lexicon-Free Conversational Speech Recognition with Neural Networks , 2015, NAACL.

[160]  Yoav Goldberg,et al.  Towards String-To-Tree Neural Machine Translation , 2017, ACL.

[161]  Gregory Shakhnarovich,et al.  Visually Grounded Learning of Keyword Prediction from Untranscribed Speech , 2017, INTERSPEECH.

[162]  Wojciech Samek,et al.  Methods for interpreting and understanding deep neural networks , 2017, Digit. Signal Process..

[163]  Yonatan Belinkov,et al.  Neural Attention for Learning to Rank Questions in Community Question Answering , 2016, COLING.

[164]  Ankur Taly,et al.  Axiomatic Attribution for Deep Networks , 2017, ICML.

[165]  James R. Glass,et al.  Learning Word-Like Units from Joint Audio-Visual Analysis , 2017, ACL.

[166]  Lucia Specia,et al.  Shallow Semantic Trees for SMT , 2011, WMT@EMNLP.

[167]  Alexander M. Fraser,et al.  Target-side Word Segmentation Strategies for Neural Machine Translation , 2017, WMT.

[168]  Klaus-Robert Müller,et al.  Explaining Recurrent Neural Network Predictions in Sentiment Analysis , 2017, WASSA@EMNLP.

[169]  Cícero Nogueira dos Santos,et al.  Deep Convolutional Neural Networks for Sentiment Analysis of Short Texts , 2014, COLING.

[170]  Christof Monz,et al.  The Importance of Being Recurrent for Modeling Hierarchical Structure , 2018, EMNLP.

[171]  Björn W. Schuller,et al.  From speech to letters - using a novel neural network architecture for grapheme based ASR , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[172]  Nizar Habash,et al.  MADAMIRA: A Fast, Comprehensive Tool for Morphological Analysis and Disambiguation of Arabic , 2014, LREC.

[173]  Bo Xu,et al.  Learning New Semi-Supervised Deep Auto-encoder Features for Statistical Machine Translation , 2014, ACL.

[174]  Bryan Rink,et al.  A Novel Distributional Approach to Multilingual Conceptual Metaphor Recognition , 2014, COLING.

[175]  Ye Zhang,et al.  Rationale-Augmented Convolutional Neural Networks for Text Classification , 2016, EMNLP.

[176]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[177]  Xirong Li,et al.  Deep Text Classification Can be Fooled , 2017, IJCAI.

[178]  Daniel Gildea,et al.  Comparing Representations of Semantic Roles for String-To-Tree Decoding , 2014, EMNLP.

[179]  Guillaume Lample,et al.  What you can cram into a single $&!#* vector: Probing sentence embeddings for linguistic properties , 2018, ACL.

[180]  Percy Liang,et al.  Understanding Black-box Predictions via Influence Functions , 2017, ICML.

[181]  C. D. Forgie,et al.  Automatic Recognition of Spoken Digits , 1958 .

[182]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[183]  Kevin Knight,et al.  A Decoder for Syntax-based Statistical MT , 2002, ACL.

[184]  Daniel Jurafsky,et al.  A Hierarchical Neural Autoencoder for Paragraphs and Documents , 2015, ACL.

[185]  Jason Lee,et al.  Fully Character-Level Neural Machine Translation without Explicit Segmentation , 2016, TACL.

[186]  Stephan Vogel,et al.  Utilizing Target-Side Semantic Role Labels to Assist Hierarchical Phrase-based Machine Translation , 2011, SSST@ACL.

[187]  Pascale Fung,et al.  Can Semantic Role Labeling Improve SMT? , 2009, EAMT.

[188]  David Chiang,et al.  A Hierarchical Phrase-Based Model for Statistical Machine Translation , 2005, ACL.

[189]  Alexander M. Fraser,et al.  Neural Morphological Tagging of Lemma Sequences for Machine Translation , 2018, AMTA.

[190]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[191]  Preslav Nakov,et al.  A Hybrid Morpheme-Word Representation for Machine Translation of Morphologically Rich Languages , 2010, EMNLP.

[192]  Yonatan Belinkov,et al.  Understanding and Improving Morphological Learning in the Neural Machine Translation Decoder , 2017, IJCNLP.

[193]  Marcel van Gerven,et al.  Explanation Methods in Deep Learning: Users, Values, Concerns and Challenges , 2018, ArXiv.

[194]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[195]  Regina Barzilay,et al.  Unsupervised Morphology Rivals Supervised Morphology for Arabic MT , 2012, ACL.

[196]  Lonnie Chrisman,et al.  Learning Recursive Distributed Representations for Holistic Computation , 1991 .

[197]  Daniel Gildea,et al.  Semantic Roles for String to Tree Machine Translation , 2013, ACL.

[198]  Yonatan Belinkov,et al.  Analyzing Hidden Representations in End-to-End Automatic Speech Recognition Systems , 2017, NIPS.

[199]  Eric P. Xing,et al.  Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , 2014, ACL 2014.

[200]  Alexander M. Rush,et al.  Character-Aware Neural Language Models , 2015, AAAI.

[201]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[202]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[203]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[204]  J. Elman Distributed representations, simple recurrent networks, and grammatical structure , 1991, Machine Learning.

[205]  Mauro Cettolo,et al.  WIT3: Web Inventory of Transcribed and Translated Talks , 2012, EAMT.

[206]  Yoshua Bengio,et al.  End-to-end attention-based large vocabulary speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[207]  Wei Xu,et al.  Deep Recurrent Models with Fast-Forward Connections for Neural Machine Translation , 2016, TACL.

[208]  Yang Liu,et al.  Visualizing and Understanding Neural Machine Translation , 2017, ACL.

[209]  Rico Sennrich,et al.  Predicting Target Language CCG Supertags Improves Neural Machine Translation , 2017, WMT.

[210]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.

[211]  Ding Liu,et al.  Improved Tree-to-String Transducer for Machine Translation , 2008, WMT@ACL.

[212]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[213]  Omer Levy,et al.  Neural Word Embedding as Implicit Matrix Factorization , 2014, NIPS.

[214]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[215]  Trevor Darrell,et al.  Multimodal Explanations: Justifying Decisions and Pointing to the Evidence , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[216]  Holger Schwenk,et al.  Supervised Learning of Universal Sentence Representations from Natural Language Inference Data , 2017, EMNLP.

[217]  Nina Narodytska,et al.  Simple Black-Box Adversarial Attacks on Deep Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[218]  Xavier Carreras,et al.  Non-Projective Parsing for Statistical Machine Translation , 2009, EMNLP.

[219]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[220]  Gemma Boleda,et al.  Distributional vectors encode referential attributes , 2015, EMNLP.

[221]  Mauro Cettolo An Arabic-Hebrew parallel corpus of TED talks , 2016, ArXiv.

[222]  Hsiao-Wuen Hon,et al.  Speaker-independent phone recognition using hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[223]  Arne Köhn,et al.  What’s in an Embedding? Analyzing Word Embeddings through Multilingual Evaluation , 2015, EMNLP.

[224]  Hermann Ney,et al.  Acoustic modeling with deep neural networks using raw time signal for LVCSR , 2014, INTERSPEECH.

[225]  Tom M. Mitchell,et al.  Learning Effective and Interpretable Semantic Models using Non-Negative Sparse Embedding , 2012, COLING.

[226]  Tasha Nagamine,et al.  Exploring how deep neural networks form phonemic categories , 2015, INTERSPEECH.

[227]  James R. Glass,et al.  Unsupervised Learning of Spoken Language with Visual Context , 2016, NIPS.

[228]  Yonatan Belinkov,et al.  Synthetic and Natural Noise Both Break Neural Machine Translation , 2017, ICLR.

[229]  Been Kim,et al.  Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.

[230]  Holger Schwenk,et al.  Continuous space language models , 2007, Comput. Speech Lang..

[231]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[232]  Yoshimasa Tsuruoka,et al.  Tree-to-Sequence Attentional Neural Machine Translation , 2016, ACL.

[233]  Regina Barzilay,et al.  Rationalizing Neural Predictions , 2016, EMNLP.

[234]  Alex Acero,et al.  Spoken Language Processing: A Guide to Theory, Algorithm and System Development , 2001 .

[235]  Andrew W. Senior,et al.  Long short-term memory recurrent neural network architectures for large scale acoustic modeling , 2014, INTERSPEECH.

[236]  Shozo Makino,et al.  Recognition of consonant based on the perceptron model , 1983, ICASSP.

[237]  Sanjeev Khudanpur,et al.  Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[238]  Tapio Salakoski,et al.  Care Episode Retrieval , 2014, Louhi@EACL.

[239]  Ding Liu,et al.  Semantic Role Features for Machine Translation , 2010, COLING.

[240]  P. Neural Network Classifiers for Speech Recognition , 2007 .

[241]  Facebook,et al.  Houdini : Fooling Deep Structured Visual and Speech Recognition Models with Adversarial Examples , 2017 .

[242]  Chong Wang,et al.  Reading Tea Leaves: How Humans Interpret Topic Models , 2009, NIPS.

[243]  Yoshua Bengio,et al.  The representational geometry of word meanings acquired by neural machine translation models , 2017, Machine Translation.

[244]  Yoshua Bengio,et al.  Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[245]  Yonatan Belinkov,et al.  VectorSLU: A Continuous Word Vector Approach to Answer Selection in Community Question Answering Systems , 2015, *SEMEVAL.

[246]  David Chiang,et al.  Hierarchical Phrase-Based Translation , 2007, CL.

[247]  Yonatan Belinkov,et al.  A Character-level Convolutional Neural Network for Distinguishing Similar Languages and Dialects , 2016, VarDial@COLING.

[248]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[249]  Liang Huang,et al.  A Syntax-Directed Translator with Extended Domain of Locality , 2006 .

[250]  Kevin Gimpel,et al.  Tailoring Continuous Word Representations for Dependency Parsing , 2014, ACL.

[251]  Richard Socher,et al.  A Neural Network for Factoid Question Answering over Paragraphs , 2014, EMNLP.

[252]  Philipp Koehn,et al.  Exploring Word Sense Disambiguation Abilities of Neural Machine Translation Systems (Non-archival Extended Abstract) , 2018, AMTA.

[253]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[254]  Helmut Schmid,et al.  Part-of-Speech Tagging With Neural Networks , 1994, COLING.

[255]  Mitchell P. Marcus,et al.  Maximum entropy models for natural language ambiguity resolution , 1998 .

[256]  Fernando Pereira,et al.  Weighted finite-state transducers in speech recognition , 2002, Comput. Speech Lang..

[257]  Wang Ling,et al.  Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation , 2015, EMNLP.

[258]  Stephan Oepen,et al.  Who Did What to Whom? A Contrastive Study of Syntacto-Semantic Dependencies , 2012, LAW@ACL.

[259]  Chong Wang,et al.  Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.

[260]  P. Denes,et al.  The design and operation of the mechanical speech recognizer at University College London , 1959 .

[261]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[262]  Mohit Bansal,et al.  Interpreting Neural Networks to Improve Politeness Comprehension , 2016, EMNLP.

[263]  Quoc V. Le,et al.  Listen, Attend and Spell , 2015, ArXiv.

[264]  Moustapha Cissé,et al.  Fooling End-To-End Speaker Verification With Adversarial Examples , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[265]  Victor O. K. Li,et al.  Non-Autoregressive Neural Machine Translation , 2017, ICLR.

[266]  Tara N. Sainath,et al.  Learning the speech front-end with raw waveform CLDNNs , 2015, INTERSPEECH.

[267]  Pascale Fung,et al.  Semantic Roles for SMT: A Hybrid Two-Pass Model , 2009, NAACL.

[268]  James R. Glass A probabilistic framework for segment-based speech recognition , 2003, Comput. Speech Lang..

[269]  Sameep Mehta,et al.  Towards Crafting Text Adversarial Samples , 2017, ArXiv.

[270]  David A. Wagner,et al.  Audio Adversarial Examples: Targeted Attacks on Speech-to-Text , 2018, 2018 IEEE Security and Privacy Workshops (SPW).

[271]  Blaine Nelson,et al.  Poisoning Attacks against Support Vector Machines , 2012, ICML.

[272]  Sharon Goldwater,et al.  Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , 2012 .

[273]  Jonathan Le Roux,et al.  Discriminative Training for Large-Vocabulary Speech Recognition Using Minimum Classification Error , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[274]  Lemao Liu,et al.  Neural Machine Translation with Source Dependency Representation , 2017, EMNLP.

[275]  Quoc V. Le,et al.  Grounded Compositional Semantics for Finding and Describing Images with Sentences , 2014, TACL.

[276]  Willem H. Zuidema,et al.  Visualisation and 'diagnostic classifiers' reveal how recurrent and recursive neural networks process hierarchical structure , 2017, J. Artif. Intell. Res..

[277]  James R. Glass,et al.  Speech2Vec: A Sequence-to-Sequence Framework for Learning Word Embeddings from Speech , 2018, INTERSPEECH.

[278]  Yonatan Belinkov,et al.  Large-Scale Machine Translation between Arabic and Hebrew: Available Corpora and Initial Results , 2016, ArXiv.

[279]  Ewan Dunbar,et al.  Learning Weakly Supervised Multimodal Phoneme Embeddings , 2017, INTERSPEECH.

[280]  Joakim Nivre,et al.  Dependency Grammar and Dependency Parsing , 2005 .

[281]  Victor Zue,et al.  The MIT SUMMIT Speech Recognition System: A Progress Report , 1989, HLT.

[282]  Wei Chen,et al.  A Character-Aware Encoder for Neural Machine Translation , 2016, COLING.

[283]  Ruslan Mitkov,et al.  Recent Advances in Natural Language Processing: Selected Papers from RANLP ’95 , 1997 .

[284]  Daniel P. W. Ellis,et al.  Brief History of Automatic Speech Recognition , 2011 .

[285]  Roger Wattenhofer,et al.  Natural Language Multitasking: Analyzing and Improving Syntactic Saliency of Hidden Representations , 2018, NIPS 2018.

[286]  Philipp Koehn,et al.  Syntax-based Statistical Machine Translation , 2016, Synthesis Lectures on Human Language Technologies.

[287]  José A. R. Fonollosa,et al.  Character-based Neural Machine Translation , 2016, ACL.

[288]  David M. Blei,et al.  Probabilistic topic models , 2012, Commun. ACM.

[289]  Lorna Balkan,et al.  TSNLP - Test Suites for Natural Language Processing , 1996, COLING.

[290]  Yusuke Miyao,et al.  SemEval 2015 Task 18: Broad-Coverage Semantic Dependency Parsing , 2015, *SEMEVAL.

[291]  Sara Veldhoen,et al.  Diagnostic Classifiers Revealing how Neural Networks Process Hierarchical Structure , 2016, CoCo@NIPS.

[292]  Marcin Junczys-Dowmunt,et al.  The United Nations Parallel Corpus v1.0 , 2016, LREC.

[293]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[294]  Xuanjing Huang,et al.  Analyzing Linguistic Knowledge in Sequential Model of Sentence , 2016, EMNLP.

[295]  Kristina Toutanova,et al.  Applying Morphology Generation Models to Machine Translation , 2008, ACL.

[296]  Philipp Koehn,et al.  Neural Machine Translation , 2017, ArXiv.

[297]  Georgiana Dinu,et al.  Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors , 2014, ACL.

[298]  Alexander M. Fraser,et al.  Modeling Inflection and Word-Formation in SMT , 2012, EACL.

[299]  Jinxi Xu,et al.  String-to-Dependency Statistical Machine Translation , 2010, CL.

[300]  Tom M. Mitchell,et al.  A Compositional and Interpretable Semantic Space , 2015, NAACL.

[301]  Eduard H. Hovy,et al.  Recursive Deep Models for Discourse Parsing , 2014, EMNLP.

[302]  Sanja Fidler,et al.  Skip-Thought Vectors , 2015, NIPS.

[303]  Bin Yu,et al.  Beyond Word Importance: Contextual Decomposition to Extract Interactions from LSTMs , 2018, ICLR.

[304]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[305]  Phil Blunsom,et al.  Recurrent Continuous Translation Models , 2013, EMNLP.

[306]  Yoshimasa Tsuruoka,et al.  Learning to Parse and Translate Improves Neural Machine Translation , 2017, ACL.

[307]  Geoffrey E. Hinton,et al.  A Scalable Hierarchical Distributed Language Model , 2008, NIPS.

[308]  Miles Osborne,et al.  Statistical Machine Translation , 2010, Encyclopedia of Machine Learning and Data Mining.

[309]  Noah D. Goodman,et al.  Evaluating Compositionality in Sentence Embeddings , 2018, CogSci.

[310]  Nizar Habash,et al.  Introduction to Arabic Natural Language Processing , 2010, Introduction to Arabic Natural Language Processing.

[311]  Kevin Duh,et al.  Extracting Pre-ordering Rules from Predicate-Argument Structures , 2011, IJCNLP.

[312]  Christopher D. Manning,et al.  Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character Models , 2016, ACL.

[313]  D. B. Fry,et al.  Theoretical aspects of mechanical speech recognition , 1959 .

[314]  Lalit R. Bahl,et al.  A Maximum Likelihood Approach to Continuous Speech Recognition , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[315]  Yonatan Belinkov,et al.  Challenging Language-Dependent Segmentation for Arabic: An Application to Machine Translation and Part-of-Speech Tagging , 2017, ACL.

[316]  Grzegorz Chrupala,et al.  Encoding of phonology in a recurrent neural model of grounded speech , 2017, CoNLL.

[317]  Yoshimasa Tsuruoka,et al.  Neural Machine Translation with Source-Side Latent Graph Parsing , 2017, EMNLP.

[318]  Adam Lopez,et al.  From Characters to Words to in Between: Do We Capture Morphology? , 2017, ACL.

[319]  Nizar Habash,et al.  Permission is granted to quote short excerpts and to reproduce figures and tables from this report, provided that the source of such material is fully acknowledged. Arabic Preprocessing Schemes for Statistical Machine Translation , 2006 .

[320]  Cícero Nogueira dos Santos,et al.  Learning Character-level Representations for Part-of-Speech Tagging , 2014, ICML.

[321]  Grzegorz Chrupala,et al.  From phonemes to images: levels of representation in a recurrent neural model of visually-grounded language learning , 2016, COLING.

[322]  Percy Liang,et al.  Adversarial Examples for Evaluating Reading Comprehension Systems , 2017, EMNLP.

[323]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[324]  Yonatan Belinkov,et al.  Exploring Compositional Architectures and Word Vector Representations for Prepositional Phrase Attachment , 2014, Transactions of the Association for Computational Linguistics.

[325]  Francisco Casacuberta,et al.  A connectionist approach to machine translation , 1997, EUROSPEECH.

[326]  Philip Resnik,et al.  Modeling Syntactic and Semantic Structures in Hierarchical Phrase-based Translation , 2013, HLT-NAACL.

[327]  Yonatan Belinkov,et al.  QMDIS: QCRI-MIT Advanced Dialect Identification System , 2017, INTERSPEECH.

[328]  Edouard Grave,et al.  Colorless Green Recurrent Networks Dream Hierarchically , 2018, NAACL.

[329]  Tomaso Poggio,et al.  From Understanding Computation to Understanding Neural Circuitry , 1976 .

[330]  Dejing Dou,et al.  HotFlip: White-Box Adversarial Examples for NLP , 2017, ArXiv.

[331]  Hagen Soltau,et al.  Neural Speech Recognizer: Acoustic-to-Word LSTM Model for Large Vocabulary Speech Recognition , 2016, INTERSPEECH.

[332]  Daniel Jurafsky,et al.  A Conditional Random Field Word Segmenter for Sighan Bakeoff 2005 , 2005, IJCNLP.

[333]  Bhuvana Ramabhadran,et al.  Direct Acoustics-to-Word Models for English Conversational Speech Recognition , 2017, INTERSPEECH.

[334]  Rico Sennrich,et al.  Controlling Politeness in Neural Machine Translation via Side Constraints , 2016, NAACL.

[335]  Ai Ti Aw,et al.  A tree-to-tree alignment-based model for statistical machine translation , 2007, MTSUMMIT.

[336]  Nigel G. Ward Machine Translation: Past, Present, Future , 2001 .

[337]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[338]  Eduard H. Hovy,et al.  When Are Tree Structures Necessary for Deep Learning of Representations? , 2015, EMNLP.

[339]  Hermann Ney,et al.  Improving SMT quality with morpho-syntactic analysis , 2000, COLING.

[340]  Jonathan Berant,et al.  Semantic Parsing via Paraphrasing , 2014, ACL.

[341]  Quoc V. Le,et al.  Massive Exploration of Neural Machine Translation Architectures , 2017, EMNLP.

[342]  Zachary Chase Lipton The mythos of model interpretability , 2016, ACM Queue.