Byte-Level Machine Reading Across Morphologically Varied Languages

The machine reading task, where a computer reads a document and answers questions about it, is important in artificial intelligence research. Recently, many models have been proposed to address it. Word-level models, which have words as units of input and output, have proven to yield state-of-theart results when evaluated on English datasets. However, in morphologically richer languages, many more unique words exist than in English due to highly productive prefix and suffix mechanisms. This may set back word-level models, since vocabulary sizes too big to allow for efficient computing may have to be employed. Multiple alternative input granularities have been proposed to avoid large input vocabularies, such as morphemes, character n-grams, and bytes. Bytes are advantageous as they provide a universal encoding format across languages, and allow for a small vocabulary size, which, moreover, is identical for every input language. In this work, we investigate whether bytes are suitable as input units across morphologically varied languages. To test this, we introduce two large-scale machine reading datasets in morphologically rich languages, Turkish and Russian. We implement 4 byte-level models, representing the major types of machine reading models and introduce a new seq2seq variant, called encoder-transformer-decoder. We show that, for all languages considered, there are models reading bytes outperforming the current state-of-the-art word-level baseline. Moreover, the newly introduced encoder-transformer-decoder performs best on the morphologically most involved dataset, Turkish. The large-scale Turkish and Russian machine reading datasets are released to public.

[1]  Oriol Vinyals,et al.  Multilingual Language Processing From Bytes , 2015, NAACL.

[2]  Shuohang Wang,et al.  Machine Comprehension Using Match-LSTM and Answer Pointer , 2016, ICLR.

[3]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[4]  Mirella Lapata,et al.  Long Short-Term Memory-Networks for Machine Reading , 2016, EMNLP.

[5]  Geoffrey E. Hinton,et al.  Grammar as a Foreign Language , 2014, NIPS.

[6]  Christopher D. Manning,et al.  Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character Models , 2016, ACL.

[7]  Jason Weston,et al.  End-To-End Memory Networks , 2015, NIPS.

[8]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[9]  Yi Yang,et al.  WikiQA: A Challenge Dataset for Open-Domain Question Answering , 2015, EMNLP.

[10]  Oren Etzioni,et al.  Paraphrase-Driven Learning for Open Question Answering , 2013, ACL.

[11]  Yonghui Wu,et al.  Exploring the Limits of Language Modeling , 2016, ArXiv.

[12]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[13]  Wang Ling,et al.  Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation , 2015, EMNLP.

[14]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[15]  Jason Weston,et al.  Memory Networks , 2014, ICLR.

[16]  Markus Krötzsch,et al.  Wikidata , 2014, Commun. ACM.

[17]  Alexander M. Rush,et al.  Character-Aware Neural Language Models , 2015, AAAI.

[18]  Jason Weston,et al.  Key-Value Memory Networks for Directly Reading Documents , 2016, EMNLP.

[19]  Phil Blunsom,et al.  Recurrent Continuous Translation Models , 2013, EMNLP.

[20]  M. de Rijke,et al.  Attentive Memory Networks: Efficient Machine Reading for Conversational Search , 2017, ArXiv.

[21]  Wei Zhang,et al.  Knowledge vault: a web-scale approach to probabilistic knowledge fusion , 2014, KDD.

[22]  Phil Blunsom,et al.  Teaching Machines to Read and Comprehend , 2015, NIPS.

[23]  Christopher D. Manning,et al.  Better Word Representations with Recursive Neural Networks for Morphology , 2013, CoNLL.

[24]  Yoshua Bengio,et al.  A Character-level Decoder without Explicit Segmentation for Neural Machine Translation , 2016, ACL.

[25]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[26]  Phil Blunsom,et al.  Reasoning about Entailment with Neural Attention , 2015, ICLR.

[27]  Kyunghyun Cho,et al.  Efficient Character-level Document Classification by Combining Convolution and Recurrent Layers , 2016, ArXiv.

[28]  Lior Wolf,et al.  Using the Output Embedding to Improve Language Models , 2016, EACL.

[29]  Phil Blunsom,et al.  Compositional Morphology for Word Representations and Language Modelling , 2014, ICML.

[30]  M. de Rijke,et al.  Siamese CBOW: Optimizing Word Embeddings for Sentence Representations , 2016, ACL.

[31]  Andrew W. Senior,et al.  Long short-term memory recurrent neural network architectures for large scale acoustic modeling , 2014, INTERSPEECH.

[32]  David Berthelot,et al.  WikiReading: A Novel Large-scale Language Understanding Task over Wikipedia , 2016, ACL.

[33]  Ye Yuan,et al.  Encode, Review, and Decode: Reviewer Module for Caption Generation , 2016, ArXiv.