A neural approach for inducing multilingual resources and natural language processing tools for low-resource languages

This work focuses on the rapid development of linguistic annotation tools for low-resource languages (languages that have no labeled training data). We experiment with several cross-lingual annotation projection methods using recurrent neural networks (RNN) models. The distinctive feature of our approach is that our multilingual word representation requires only a parallel corpus between source and target languages. More precisely, our approach has the following characteristics: (a) it does not use word alignment information, (b) it does not assume any knowledge about target languages (one requirement is that the two languages (source and target) are not too syntactically divergent), which makes it applicable to a wide range of low-resource languages, (c) it provides authentic multilingual taggers (one tagger for N languages). We investigate both uni and bidirectional RNN models and propose a method to include external information (for instance, low-level information from part-of-speech tags) in the RNN to train higher level taggers (for instance, Super Sense taggers). We demonstrate the validity and genericity of our model by using parallel corpora (obtained by manual or automatic translation). Our experiments are conducted to induce cross-lingual part-of-speech and Super Sense taggers. We also use our approach in a weakly supervised context, and it shows an excellent potential for very low-resource settings (less than 1k training utterances).

[1]  Jürgen Schmidhuber,et al.  A Fixed Size Storage O(n3) Time Complexity Learning Algorithm for Fully Recurrent Continually Running Networks , 1992, Neural Computation.

[2]  Sabine Buchholz,et al.  CoNLL-X Shared Task on Multilingual Dependency Parsing , 2006, CoNLL.

[3]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[4]  Ivan Titov,et al.  Crosslingual Induction of Semantic Roles , 2012, ACL.

[5]  Slav Petrov,et al.  A Universal Part-of-Speech Tagset , 2011, LREC.

[6]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[7]  Yoshua Bengio,et al.  BilBOWA: Fast Bilingual Distributed Representations without Word Alignments , 2014, ICML.

[8]  Joakim Nivre,et al.  Token and Type Constraints for Cross-Lingual Part-of-Speech Tagging , 2013, TACL.

[9]  David Yarowsky,et al.  Inducing Multilingual Text Analysis Tools via Robust Projection across Aligned Corpora , 2001, HLT.

[10]  Olivier Pietquin,et al.  MultiVec: a Multilingual and Multilevel Representation Learning Toolkit for NLP , 2016, LREC.

[11]  Alex Graves,et al.  Supervised Sequence Labelling , 2012 .

[12]  Christopher D. Manning,et al.  Bilingual Word Representations with Monolingual Quality in Mind , 2015, VS@HLT-NAACL.

[13]  Lonneke van der Plas,et al.  Cross-lingual Word Sense Disambiguation for Predicate Labelling of French , 2014, TALN.

[14]  Tanja Schultz,et al.  Automatic speech recognition for under-resourced languages: A survey , 2014, Speech Commun..

[15]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[16]  Anders Søgaard,et al.  Simple task-specific bilingual word embeddings , 2015, NAACL.

[17]  Yasemin Altun,et al.  Broad-Coverage Sense Disambiguation and Information Extraction with a Supersense Sequence Tagger , 2006, EMNLP.

[18]  Thorsten Brants,et al.  TnT – A Statistical Part-of-Speech Tagger , 2000, ANLP.

[19]  Emanuele Pianta,et al.  Evaluating Cross-Language Annotation Transfer in the MultiSemCor Corpus , 2004, COLING.

[20]  Slav Petrov,et al.  Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections , 2011, ACL.

[21]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[22]  Dan Klein,et al.  Syntactic Transfer Using a Bilingual Lexicon , 2012, EMNLP-CoNLL.

[23]  Roberto Basili,et al.  Cross-Lingual Alignment of FrameNet Annotations through Hidden Markov Models , 2010, CICLing.

[24]  Jakob Uszkoreit,et al.  Cross-lingual Word Clusters for Direct Transfer of Linguistic Structure , 2012, NAACL.

[25]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[26]  Raazesh Sainudiin,et al.  DAEBAK!: Peripheral Diversity for Multilingual Word Sense Disambiguation , 2013, SemEval@NAACL-HLT.

[27]  Kristina Toutanova,et al.  Multilingual Named Entity Recognition using Parallel Data and Metadata from Wikipedia , 2012, ACL.

[28]  Andrés Montoyo,et al.  Enriching the Integration of Semantic Resources based on WordNet , 2011, Proces. del Leng. Natural.

[29]  Geoffrey Leech,et al.  Corpus Annotation: Linguistic Information from Computer Text Corpora , 1997 .

[30]  Yoshua Bengio,et al.  Neural Probabilistic Language Models , 2006 .

[31]  Andy Way,et al.  Translating Low-Resource Languages by Vocabulary Adaptation from Close Counterparts , 2017, ACM Trans. Asian Low Resour. Lang. Inf. Process..

[32]  Joakim Nivre,et al.  Target Language Adaptation of Discriminative Transfer Parsers , 2013, NAACL.

[33]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[34]  Benjamin Lecouteux,et al.  The LIG English to French machine translation system for IWSLT 2012 , 2012, IWSLT.

[35]  Qun Liu,et al.  Automatic Adaptation of Annotations , 2015, Computational Linguistics.

[36]  James Henderson,et al.  Discriminative Training of a Neural Network Statistical Parser , 2004, ACL.

[37]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[38]  Alexander M. Fraser,et al.  Squibs and Discussions: Measuring Word Alignment Quality for Statistical Machine Translation , 2007, CL.

[39]  Ben Taskar,et al.  Wiki-ly Supervised Part-of-Speech Tagging , 2012, EMNLP.

[40]  Hermann Ney,et al.  Comparison of feedforward and recurrent neural network language models , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[41]  Steven Skiena,et al.  Polyglot: Distributed Word Representations for Multilingual NLP , 2013, CoNLL.

[42]  François Yvon,et al.  Zero-resource Dependency Parsing: Boosting Delexicalized Cross-lingual Transfer with Linguistic Knowledge , 2016, COLING.

[43]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[44]  Pavel Pecina,et al.  Simpler unsupervised POS tagging with bilingual projections , 2013, ACL.

[45]  Didier Schwab,et al.  Ant Colony Algorithm for the Unsupervised Word Sense Disambiguation of Texts: Comparison and Evaluation , 2012, COLING.

[46]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[47]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[48]  Didier Schwab,et al.  Création rapide et efficace d'un système de désambiguïsation lexicale pour une langue peu dotée , 2015, TALN.

[49]  Qun Liu,et al.  Relaxed Cross-lingual Projection of Constituent Syntax , 2011, EMNLP.

[50]  George A. Miller,et al.  A Semantic Concordance , 1993, HLT.

[51]  François Yvon,et al.  Cross-Lingual Part-of-Speech Tagging through Ambiguous Learning , 2014, EMNLP.

[52]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[53]  Roberto Navigli,et al.  SemEval-2013 Task 12: Multilingual Word Sense Disambiguation , 2013, *SEMEVAL.

[54]  Simone Paolo Ponzetto,et al.  BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network , 2012, Artif. Intell..

[55]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[56]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[57]  Hermann Ney,et al.  Improved Statistical Alignment Models , 2000, ACL.

[58]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .