Controlling Utterance Length in NMT-based Word Segmentation with Attention

One of the basic tasks of computational language documentation (CLD) is to identify word boundaries in an unsegmented phonemic stream. While several unsupervised monolingual word segmentation algorithms exist in the literature, they are challenged in real-world CLD settings by the small amount of available data. A possible remedy is to take advantage of glosses or translation in a foreign, well-resourced, language, which often exist for such data. In this paper, we explore and compare ways to exploit neural machine translation models to perform unsupervised boundary detection with bilingual information, notably introducing a new loss function for jointly learning alignment and segmentation. We experiment with an actual under-resourced language, Mboshi, and show that these techniques can effectively control the output segmentation length.

[1]  Mark Johnson,et al.  Unsupervised Word Segmentation for Sesotho Using Adaptor Grammars , 2008, SIGMORPHON.

[2]  Alexander M. Rush,et al.  Structured Attention Networks , 2017, ICLR.

[3]  Gholamreza Haffari,et al.  Incorporating Structural Alignment Biases into an Attentional Neural Translation Model , 2016, NAACL.

[4]  Taro Watanabe,et al.  Machine Translation without Words through Substring Alignment , 2012, ACL.

[5]  Graham Neubig,et al.  Phonemic Transcription of Low-Resource Tonal Languages , 2017, ALTA.

[6]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[7]  Michael C. Frank,et al.  The length of words reflects their conceptual complexity , 2016, Cognition.

[8]  Yaser Al-Onaizan,et al.  Temporal Attention Model for Neural Machine Translation , 2016, ArXiv.

[9]  Martine Adda-Decker,et al.  Parallel Speech Collection for Under-resourced Language Studies Using the Lig-Aikuma Mobile Device App , 2016, SLTU.

[10]  Yuval Pinter,et al.  Attention is not not Explanation , 2019, EMNLP.

[11]  Christof Monz,et al.  What does Attention in Neural Machine Translation Pay Attention to? , 2017, IJCNLP.

[12]  Hermann Ney,et al.  Generating Alignments Using Target Foresight in Attention-Based Neural Machine Translation , 2017, Prague Bull. Math. Linguistics.

[13]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[14]  Lemao Liu,et al.  Neural Machine Translation with Supervised Attention , 2016, COLING.

[15]  Aline Villavicencio,et al.  Unsupervised Word Segmentation from Speech with Attention , 2018, INTERSPEECH.

[16]  Maria Leonor Pacheco,et al.  of the Association for Computational Linguistics: , 2001 .

[17]  François Yvon,et al.  Adaptor Grammars for the Linguist: Word Segmentation Experiments for Very Low-Resource Languages , 2018, Proceedings of the Fifteenth Workshop on Computational Research in Phonetics, Phonology, and Morphology.

[18]  Louis B. Rall,et al.  Automatic differentiation , 1981 .

[19]  Xu Sun,et al.  Learning When to Concentrate or Divert Attention: Self-Adaptive Attention Temperature for Neural Machine Translation , 2018, EMNLP.

[20]  Aline Villavicencio,et al.  Unwritten languages demand attention too! Word discovery with encoder-decoder models , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).

[21]  Mathias Creutz,et al.  Unsupervised models for morpheme segmentation and morphology learning , 2007, TSLP.

[22]  Adam Lopez,et al.  Pre-training on high-resource speech recognition improves low-resource speech-to-text translation , 2018, NAACL.

[23]  Hermann Ney,et al.  On The Alignment Problem In Multi-Head Attention-Based Neural Machine Translation , 2018, WMT.

[24]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[25]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[26]  Steven Bird,et al.  Aikuma: A Mobile App for Collaborative Language Documentation , 2014 .

[27]  Shi Feng,et al.  Implicit Distortion and Fertility Models for Attention-based Encoder-Decoder NMT Model , 2016, ArXiv.

[28]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[29]  Yang Liu,et al.  Modeling Coverage for Neural Machine Translation , 2016, ACL.

[30]  António Branco,et al.  Attention Focusing for Neural Machine Translation by Bridging Source and Target Embeddings , 2017, ACL.

[31]  Philipp Koehn,et al.  Six Challenges for Neural Machine Translation , 2017, NMT@ACL.

[32]  Taku Kudo,et al.  Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates , 2018, ACL.

[33]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[34]  David Chiang,et al.  Tied Multitask Learning for Neural Speech Translation , 2018, NAACL.

[35]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[36]  Adam Lopez,et al.  Low-Resource Speech-to-Text Translation , 2018, INTERSPEECH.

[37]  Alexander J. Smola,et al.  Neural Machine Translation with Recurrent Attention Modeling , 2016, EACL.

[38]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[39]  Sebastian Stüker,et al.  Breaking the Unwritten Language Barrier: The BULB Project , 2016, SLTU.

[40]  Mathias Creutz Unsupervised Segmentation of Words Using Prior Distributions of Morph Length and Frequency , 2003, ACL.

[41]  Hermann Ney,et al.  Biasing Attention-Based Recurrent Neural Networks Using External Alignment Information , 2017, WMT.

[42]  Zhiguo Wang,et al.  Supervised Attentions for Neural Machine Translation , 2016, EMNLP.

[43]  Byron C. Wallace,et al.  Attention is not Explanation , 2019, NAACL.

[44]  Aline Villavicencio,et al.  Empirical Evaluation of Sequence-to-Sequence Models for Word Discovery in Low-resource Settings , 2019, INTERSPEECH.

[45]  Sebastian Stüker,et al.  A Very Low Resource Language Speech Corpus for Computational Language Documentation Experiments , 2017, LREC.

[46]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[47]  Yves Robert,et al.  On the Alignment Problem , 1994, Parallel Process. Lett..

[48]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[49]  David Chiang,et al.  Leveraging translations for speech transcription in low-resource settings , 2018, INTERSPEECH.

[50]  Mark Johnson,et al.  Nonparametric bayesian models of lexical acquisition , 2007 .

[51]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[52]  Tanja Schultz,et al.  Word segmentation through cross-lingual word-to-phoneme alignment , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[53]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[54]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[55]  Graham Neubig,et al.  Evaluation Phonemic Transcription of Low-Resource Tonal Languages for Language Documentation , 2018, LREC.

[56]  David Chiang,et al.  An Attentional Model for Speech Translation Without Transcription , 2016, NAACL.

[57]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[58]  Artem Sokolov,et al.  Learning to Segment Inputs for NMT Favors Character-Level Processing , 2018, IWSLT.

[59]  Noah A. Smith,et al.  A Simple, Fast, and Effective Reparameterization of IBM Model 2 , 2013, NAACL.