Natural language processing for resource-poor languages

Natural language processing (NLP) aims, broadly speaking, to teach computers to understand human language. This is hard as the computer must comprehend many facets of language such as semantics, syntax, pragmatics and phonology which are difficult to characterize formally, let alone encode as computer instructions. It is even harder for so-called low-resource languages where the annotated resources are very limited. There are approximately 7,000 languages in the world, but of these only a small fraction (20 languages) are considered high-resource languages. Low-resource languages are in dire need of tools and resources to overcome the resource barrier such that advances in NLP can deliver more widespread benefits. Despite the lack of annotated data, there are some unannotated data resources which might be beneficial for low-resource languages including parallel data, bilingual lexical resources or clues from related languages. However, the means for effectively incorporating these resources to improve the performance of low-resource NLP is an open research question, and the target of this thesis. Out of 7,000 languages, half of them do not have a writing system and many are falling out of use. It is estimated that by the end of this century, half of the world’s languages will be extinct. It is necessary to extend the current NLP techniques to unwritten languages to process and document the languages before they are gone forever. Transfer learning provides an important opportunity for low-resource NLP, whereby annotation is transferred from a source resource-rich language to a target resource poor-language. In this thesis, we successfully apply transfer learning for

[1]  Laurent Besacier,et al.  Automatic Speech Recognition for Under-Resourced Languages: Application to Vietnamese Language , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Alex Seungryong Park,et al.  Unsupervised pattern discovery in speech: applications to word acquisition and speaker segmentation , 2006 .

[3]  Danqi Chen,et al.  A Fast and Accurate Dependency Parser using Neural Networks , 2014, EMNLP.

[4]  Yuan Li,et al.  Learning how to Active Learn: A Deep Reinforcement Learning Approach , 2017, EMNLP.

[5]  Okko Johannes Räsänen,et al.  Computational modeling of phonetic and lexical learning in early language acquisition: Existing models and future directions , 2012, Speech Commun..

[6]  James R. Glass,et al.  Unsupervised Lexicon Discovery from Acoustic Input , 2015, TACL.

[7]  Sebastian Riedel,et al.  The CoNLL 2007 Shared Task on Dependency Parsing , 2007, EMNLP.

[8]  Yoshua Bengio,et al.  BilBOWA: Fast Bilingual Distributed Representations without Word Alignments , 2014, ICML.

[9]  Aren Jansen,et al.  An evaluation of graph clustering methods for unsupervised term discovery , 2015, INTERSPEECH.

[10]  Jörg Tiedemann,et al.  Parallel Data, Tools and Interfaces in OPUS , 2012, LREC.

[11]  Noah A. Smith,et al.  Many Languages, One Parser , 2016, TACL.

[12]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[13]  Aren Jansen,et al.  The Zero Resource Speech Challenge 2015: Proposed Approaches and Results , 2016, SLTU.

[14]  Alan W. Black,et al.  Deriving Phonetic Transcriptions and Discovering Word Segmentations for Speech-to-Speech Translation in Low-Resource Settings , 2016, INTERSPEECH.

[15]  Fei Xia,et al.  Unsupervised Dependency Parsing with Transferring Distribution via Parallel Guidance and Entropy Regularization , 2014, ACL.

[16]  Omer Levy,et al.  Improving Distributional Similarity with Lessons Learned from Word Embeddings , 2015, TACL.

[17]  Gholamreza Haffari,et al.  Incorporating Structural Alignment Biases into an Attentional Neural Translation Model , 2016, NAACL.

[18]  Daniel Jurafsky,et al.  Lexicon-Free Conversational Speech Recognition with Neural Networks , 2015, NAACL.

[19]  Rudi C. Villing,et al.  Automatic Blind Syllable Segmentation for Continuous Speech , 2004 .

[20]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[21]  Meng Zhang,et al.  Bilingual Lexicon Induction from Non-Parallel Data with Minimal Supervision , 2017, AAAI.

[22]  Christopher D. Manning,et al.  Cross-lingual Pseudo-Projected Expectation Regularization for Weakly Supervised Learning , 2013, ArXiv.

[23]  Philip Resnik,et al.  Cross-Language Parser Adaptation between Related Languages , 2008, IJCNLP.

[24]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[25]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[26]  Steven Skiena,et al.  Polyglot: Distributed Word Representations for Multilingual NLP , 2013, CoNLL.

[27]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[28]  Mark Steedman,et al.  A massively parallel corpus: the Bible in 100 languages , 2014, Lang. Resour. Evaluation.

[29]  Slav Petrov,et al.  A Universal Part-of-Speech Tagset , 2011, LREC.

[30]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[31]  Hugo Larochelle,et al.  An Autoencoder Approach to Learning Bilingual Word Representations , 2014, NIPS.

[32]  Sanja Fidler,et al.  Skip-Thought Vectors , 2015, NIPS.

[33]  Sabine Buchholz,et al.  CoNLL-X Shared Task on Multilingual Dependency Parsing , 2006, CoNLL.

[34]  Aren Jansen,et al.  Fully unsupervised small-vocabulary speech recognition using a segmental Bayesian model , 2015, INTERSPEECH.

[35]  Christopher D. Manning,et al.  The Stanford Typed Dependencies Representation , 2008, CF+CDPE@COLING.

[36]  Kareem Darwish,et al.  Named Entity Recognition using Cross-lingual Resources: Arabic as an Example , 2013, ACL.

[37]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[38]  Joakim Nivre,et al.  Universal Dependency Annotation for Multilingual Parsing , 2013, ACL.

[39]  Noah A. Smith,et al.  A Simple, Fast, and Effective Reparameterization of IBM Model 2 , 2013, NAACL.

[40]  David Chiang,et al.  An Unsupervised Probability Model for Speech-to-Translation Alignment of Low-Resource Languages , 2016, EMNLP.

[41]  Alta de Waal,et al.  A smartphone-based ASR data collection tool for under-resourced languages , 2014, Speech Commun..

[42]  Phil Blunsom,et al.  A Convolutional Neural Network for Modelling Sentences , 2014, ACL.

[43]  David Yarowsky,et al.  A Representation Learning Framework for Multi-Source Transfer Parsing , 2016, AAAI.

[44]  Mohammad Pezeshki,et al.  Sequence Modeling using Gated Recurrent Neural Networks , 2015, ArXiv.

[45]  Joakim Nivre,et al.  Token and Type Constraints for Cross-Lingual Part-of-Speech Tagging , 2013, TACL.

[46]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[47]  Steven Bird A Scalable Method for Preserving Oral Literature from Small Languages , 2010, ICADL.

[48]  Marie-Francine Moens,et al.  Bilingual Lexicon Induction by Learning to Combine Word-Level and Character-Level Representations , 2017, EACL.

[49]  Gemma Boleda,et al.  Distributional Semantics in Technicolor , 2012, ACL.

[50]  Oriol Vinyals,et al.  Multilingual Language Processing From Bytes , 2015, NAACL.

[51]  Trevor Cohn,et al.  Low Resource Dependency Parsing: Cross-lingual Parameter Sharing in a Neural Network Parser , 2015, ACL.

[52]  Sebastian Stüker,et al.  Breaking the Unwritten Language Barrier: The BULB Project , 2016, SLTU.

[53]  Lori Lamel,et al.  Comparing SMT Methods for Automatic Generation of Pronunciation Variants , 2010, IceTAL.

[54]  Omer Levy,et al.  Neural Word Embedding as Implicit Matrix Factorization , 2014, NIPS.

[55]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[56]  Barbara Plank,et al.  Inverted indexing for cross-lingual NLP , 2015, ACL.

[57]  Timothy Baldwin,et al.  Lexical Normalisation of Short Text Messages: Makn Sens a #twitter , 2011, ACL.

[58]  Wanxiang Che,et al.  Joint Word Alignment and Bilingual Named Entity Recognition Using Dual Decomposition , 2013, ACL.

[59]  Jason Baldridge,et al.  Parsing low-resource languages using Gibbs sampling for PCFGs with latent annotations , 2014, EMNLP.

[60]  Ehud Rivlin,et al.  Placing search in context: the concept revisited , 2002, TOIS.

[61]  Peng Xu,et al.  Using a Dependency Parser to Improve SMT for Subject-Object-Verb Languages , 2009, NAACL.

[62]  Christopher D. Manning,et al.  Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[63]  Hung-An Chang,et al.  Resource configurable spoken query detection using Deep Boltzmann Machines , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[64]  Rainer Gruhn,et al.  Hierarchical Neural Network Structures for Phoneme Recognition , 2012 .

[65]  David Chiang,et al.  An Attentional Model for Speech Translation Without Transcription , 2016, NAACL.

[66]  Hermann Ney,et al.  Improved Statistical Alignment Models , 2000, ACL.

[67]  Thorsten Brants,et al.  TnT – A Statistical Part-of-Speech Tagger , 2000, ANLP.

[68]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[69]  Christopher D. Manning,et al.  Bilingual Word Embeddings for Phrase-Based Machine Translation , 2013, EMNLP.

[70]  Wen-tau Yih,et al.  Measuring Word Relatedness Using Heterogeneous Vector Space Models , 2012, HLT-NAACL.

[71]  Urmila Shrawankar,et al.  Techniques for Feature Extraction In Speech Recognition System : A Comparative Study , 2013, ArXiv.

[72]  Zhiyuan Liu,et al.  A Unified Model for Word Sense Representation and Disambiguation , 2014, EMNLP.

[73]  Hiroshi Kanayama,et al.  Learning Crosslingual Word Embeddings without Bilingual Corpora , 2016, EMNLP.

[74]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[75]  Steven Bird,et al.  The Human Language Project: Building a Universal Corpus of the World's Languages , 2010, ACL.

[76]  Phil Blunsom,et al.  Multilingual Models for Compositional Distributed Semantics , 2014, ACL.

[77]  Ignacio Iacobacci,et al.  SensEmbed: Learning Sense Embeddings for Word and Relational Similarity , 2015, ACL.

[78]  Oded Ghitza,et al.  Linking Speech Perception and Neurophysiology: Speech Decoding Guided by Cascaded Oscillators Locked to the Input Rhythm , 2011, Front. Psychology.

[79]  Slav Petrov,et al.  Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections , 2011, ACL.

[80]  B. Hladká,et al.  A Three-Level Annotation Scenario , 2002 .

[81]  Noah A. Smith,et al.  Transition-Based Dependency Parsing with Stack Long Short-Term Memory , 2015, ACL.

[82]  Kevin Gimpel,et al.  Tailoring Continuous Word Representations for Dependency Parsing , 2014, ACL.

[83]  Dominique Estival,et al.  Multilingual Semantic Parsing And Code-Switching , 2017, CoNLL.

[84]  Razvan C. Bunescu,et al.  A Shortest Path Dependency Kernel for Relation Extraction , 2005, HLT.

[85]  Fei Xia,et al.  Enhanced and Portable Dependency Projection Algorithms Using Interlinear Glossed Text , 2013, ACL.

[86]  Barbara Plank,et al.  Multilingual Projection for Parsing Truly Low-Resource Languages , 2016, TACL.

[87]  Noah A. Smith,et al.  Dependency Parsing , 2009, Encyclopedia of Artificial Intelligence.

[88]  Christian Biemann,et al.  Chinese Whispers - an Efficient Graph Clustering Algorithm and its Application to Natural Language Processing Problems , 2006 .

[89]  Tanja Schultz,et al.  Multilingual articulatory features , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[90]  Janet B. Pierrehumbert,et al.  Using Resource-Rich Languages to Improve Morphological Analysis of Under-Resourced Languages , 2014, LREC.

[91]  Julian M. Kupiec,et al.  Robust part-of-speech tagging using a hidden Markov model , 1992 .

[92]  Phil Blunsom,et al.  Multilingual Distributed Representations without Word Alignment , 2013, ICLR 2014.

[93]  Noam Shazeer,et al.  Swivel: Improving Embeddings by Noticing What's Missing , 2016, ArXiv.

[94]  Noah A. Smith,et al.  A Dependency Parser for Tweets , 2014, EMNLP.

[95]  Chris Brew,et al.  A Resource-light Approach to Russian Morphology: Tagging Russian using Czech resources , 2004, EMNLP.

[96]  Slav Petrov,et al.  Multi-Source Transfer of Delexicalized Dependency Parsers , 2011, EMNLP.

[97]  Aren Jansen,et al.  Efficient spoken term discovery using randomized algorithms , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[98]  Jan Cernocký,et al.  Probabilistic and Bottle-Neck Features for LVCSR of Meetings , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[99]  Trevor Cohn,et al.  Cross-lingual Transfer for Unsupervised Dependency Parsing Without Parallel Data , 2015, CoNLL.

[100]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[101]  Tanja Schultz,et al.  Web-based tools and methods for rapid pronunciation dictionary creation , 2014, Speech Commun..

[102]  Aren Jansen,et al.  Unsupervised Word Segmentation and Lexicon Discovery Using Acoustic Word Embeddings , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[103]  Daniel P. W. Ellis,et al.  Tandem connectionist feature extraction for conventional HMM systems , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[104]  Tanja Schultz,et al.  Word segmentation through cross-lingual word-to-phoneme alignment , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[105]  Helmut Schmid,et al.  Improvements in Part-of-Speech Tagging with an Application to German , 1999 .

[106]  Joelle Pineau,et al.  An Actor-Critic Algorithm for Sequence Prediction , 2016, ICLR.

[107]  Yu Zhang,et al.  Joint Learning of Phonetic Units and Word Pronunciations for ASR , 2013, EMNLP.

[108]  Felix Hill,et al.  SimLex-999: Evaluating Semantic Models With (Genuine) Similarity Estimation , 2014, CL.

[109]  Levent Özgür,et al.  Text classification with the support of pruned dependency patterns , 2010, Pattern Recognit. Lett..

[110]  Yuji Matsumoto,et al.  Statistical Dependency Analysis with Support Vector Machines , 2003, IWPT.

[111]  Joakim Nivre,et al.  Universal Stanford dependencies: A cross-linguistic typology , 2014, LREC.

[112]  张国亮,et al.  Comparison of Different Implementations of MFCC , 2001 .

[113]  Jürgen Schmidhuber,et al.  Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.

[114]  Koby Crammer,et al.  Online Large-Margin Training of Dependency Parsers , 2005, ACL.

[115]  Taro Watanabe,et al.  An Unsupervised Model for Joint Phrase Alignment and Extraction , 2011, ACL.

[116]  Ben Taskar,et al.  Dependency Grammar Induction via Bitext Projection Constraints , 2009, ACL/IJCNLP.

[117]  Phil Blunsom,et al.  A Hierarchical Pitman-Yor Process HMM for Unsupervised Part of Speech Induction , 2011, ACL.

[118]  Martine Adda-Decker,et al.  Parallel Speech Collection for Under-resourced Language Studies Using the Lig-Aikuma Mobile Device App , 2016, SLTU.

[119]  Hiroshi Kanayama,et al.  Multilingual Training of Crosslingual Word Embeddings , 2017, EACL.

[120]  Kevin P. Scannell The Crúbadán Project: Corpus building for under-resourced languages , 2007 .

[121]  Mark Johnson,et al.  Transforming Projective Bilexical Dependency Grammars into efficiently-parsable CFGs with Unfold-Fold , 2007, ACL.

[122]  Donald J. Berndt,et al.  Using Dynamic Time Warping to Find Patterns in Time Series , 1994, KDD Workshop.

[123]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[124]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[125]  Pavel Pecina,et al.  Simpler unsupervised POS tagging with bilingual projections , 2013, ACL.

[126]  Regina Barzilay,et al.  Ten Pairs to Tag – Multilingual POS Tagging via Coarse Mapping between Embeddings , 2016, NAACL.

[127]  Yue Zhang,et al.  Feature Embedding for Dependency Parsing , 2014, COLING.

[128]  Fernando Pereira,et al.  Non-Projective Dependency Parsing using Spanning Tree Algorithms , 2005, HLT.

[129]  Fei Xia,et al.  Multilingual Structural Projection across Interlinear Text , 2007, HLT-NAACL.

[130]  Jan Hajic,et al.  The Prague Dependency Treebank , 2003 .

[131]  David Yarowsky,et al.  Distinguishing systems and distinguishing senses: new evaluation methods for Word Sense Disambiguation , 1999, Natural Language Engineering.

[132]  Christopher D. Manning,et al.  Bilingual Word Representations with Monolingual Quality in Mind , 2015, VS@HLT-NAACL.

[133]  Tanja Schultz,et al.  Automatic speech recognition for under-resourced languages: A survey , 2014, Speech Commun..

[134]  Horia Cucu,et al.  ASR domain adaptation methods for low-resourced languages: Application to Romanian language , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[135]  J. B. Program transformations for optimization of parsing algorithms and other weighted logic programs , 2007 .

[136]  Min Xiao,et al.  Distributed Word Representation Learning for Cross-Lingual Dependency Parsing , 2014, CoNLL.

[137]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[138]  Dirk Hovy,et al.  Learning part-of-speech taggers with inter-annotator agreement loss , 2014, EACL.

[139]  Michael C. Frank,et al.  Unsupervised word discovery from speech using automatic segmentation into syllable-like units , 2015, INTERSPEECH.

[140]  Jason Baldridge,et al.  Learning a Part-of-Speech Tagger from Two Hours of Annotation , 2013, NAACL.

[141]  Joel Nothman,et al.  Learning multilingual named entity recognition from Wikipedia , 2013, Artif. Intell..

[142]  Phil Blunsom,et al.  Learning Bilingual Word Representations by Marginalizing Alignments , 2014, ACL.

[143]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[144]  Taro Watanabe,et al.  Machine Translation without Words through Substring Alignment , 2012, ACL.

[145]  Enhong Chen,et al.  A Probabilistic Model for Learning Multi-Prototype Word Embeddings , 2014, COLING.

[146]  P. Lewis Ethnologue : languages of the world , 2009 .

[147]  David Yarowsky,et al.  Inducing Multilingual POS Taggers and NP Bracketers via Robust Projection Across Aligned Corpora , 2001, NAACL.

[148]  Satoshi Nakamura,et al.  Learning a Lexicon and Translation Model from Phoneme Lattices , 2016, EMNLP.

[149]  Karin M. Verspoor,et al.  What Can We Get From 1000 Tokens? A Case Study of Multilingual POS Tagging For Resource-Poor Languages , 2014, EMNLP.

[150]  Ben Taskar,et al.  Wiki-ly Supervised Part-of-Speech Tagging , 2012, EMNLP.

[151]  Hinrich Schütze,et al.  AutoExtend: Extending Word Embeddings to Embeddings for Synsets and Lexemes , 2015, ACL.

[152]  Anders Søgaard,et al.  Simple task-specific bilingual word embeddings , 2015, NAACL.

[153]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[154]  Jason Baldridge,et al.  Real-World Semi-Supervised Learning of POS-Taggers for Low-Resource Languages , 2013, ACL.

[155]  Daniel H. Younger,et al.  Recognition and Parsing of Context-Free Languages in Time n^3 , 1967, Inf. Control..

[156]  Thomas L. Griffiths,et al.  Adaptor Grammars: A Framework for Specifying Compositional Nonparametric Bayesian Models , 2006, NIPS.

[157]  Quoc V. Le,et al.  Exploiting Similarities among Languages for Machine Translation , 2013, ArXiv.

[158]  Guillaume Wenzek,et al.  Trans-gram, Fast Cross-lingual Word-embeddings , 2015, EMNLP.

[159]  François Yvon,et al.  Reassessing the value of resources for cross-lingual transfer of POS tagging models , 2017, Lang. Resour. Evaluation.

[160]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[161]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[162]  Aren Jansen,et al.  Unsupervised neural network based feature extraction using weak top-down constraints , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[163]  Trevor Cohn,et al.  Learning when to trust distant supervision: An application to low-resource POS tagging using cross-lingual projection , 2016, CoNLL.

[164]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[165]  Joakim Nivre,et al.  Target Language Adaptation of Discriminative Transfer Parsers , 2013, NAACL.

[166]  Noah A. Smith,et al.  The Web as a Parallel Corpus , 2003, CL.

[167]  Matt Post,et al.  Improved speech-to-text translation with the Fisher and Callhome Spanish-English speech translation corpus , 2013, IWSLT.

[168]  Anders Søgaard Data point selection for cross-language adaptation of dependency parsers , 2011, ACL.

[169]  Serge Sharoff,et al.  Cross Language POS Taggers (and other Tools) for Indian Languages: An Experiment with Kannada using Telugu Resources , 2011 .

[170]  Chris Brew,et al.  A Cross-language Approach to Rapid Creation of New Morpho-syntactically Annotated Resources , 2006, LREC.

[171]  Ivan Titov,et al.  Inducing Crosslingual Distributed Representations of Words , 2012, COLING.

[172]  Andreas Stolcke,et al.  Cross-Domain and Cross-Language Portability of Acoustic Features Estimated by Multilayer Perceptrons , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[173]  Tat-Seng Chua,et al.  Question answering passage retrieval using dependency relations , 2005, SIGIR '05.

[174]  Pavel Pecina,et al.  Increasing the Quality and Quantity of Source Language Data for Unsupervised Cross-Lingual POS Tagging , 2013, IJCNLP.

[175]  Phil Blunsom,et al.  The PASCAL Challenge on Grammar Induction , 2012, HLT-NAACL 2012.

[176]  Manaal Faruqui,et al.  Cross-lingual Models of Word Embeddings: An Empirical Comparison , 2016, ACL.

[177]  Markus Dickinson,et al.  Cost-Effectiveness in Building a Low-Resource Morphological Analyzer for Learner Language , 2016, BEA@NAACL-HLT.

[178]  Georgiana Dinu,et al.  Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors , 2014, ACL.

[179]  Eric Brill,et al.  Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging , 1995, CL.

[180]  Steven Bird,et al.  Collecting Bilingual Audio in Remote Indigenous Communities , 2014, COLING.

[181]  Khalid Daoudi,et al.  Phonetic segmentation of speech signal using local singularity analysis , 2014, Digit. Signal Process..

[182]  Manaal Faruqui,et al.  Improving Vector Space Word Representations Using Multilingual Correlation , 2014, EACL.

[183]  Adam Lopez,et al.  Weakly supervised spoken term discovery using cross-lingual side information , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[184]  John DeNero,et al.  Painless Unsupervised Learning with Features , 2010, NAACL.

[185]  Hermann Ney,et al.  HMM-Based Word Alignment in Statistical Translation , 1996, COLING.

[186]  Regina Barzilay,et al.  Selective Sharing for Multilingual Dependency Parsing , 2012, ACL.

[187]  Nikos D. Sidiropoulos,et al.  Translation Invariant Word Embeddings , 2015, EMNLP.

[188]  Andrew Y. Ng,et al.  Improving Word Representations via Global Context and Multiple Word Prototypes , 2012, ACL.

[189]  Yoshua Bengio,et al.  Attention-Based Models for Speech Recognition , 2015, NIPS.

[190]  Dan Klein,et al.  Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency , 2004, ACL.

[191]  Yoshua Bengio,et al.  End-to-end Continuous Speech Recognition using Attention-based Recurrent NN: First Results , 2014, ArXiv.

[192]  Stephen D. Mayhew,et al.  Cross-Lingual Named Entity Recognition via Wikification , 2016, CoNLL.

[193]  Yoshua Bengio,et al.  Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[194]  Steven Bird,et al.  Aikuma: A Mobile App for Collaborative Language Documentation , 2014 .

[195]  Dan Klein,et al.  How much do word embeddings encode about syntax? , 2014, ACL.

[196]  Joakim Nivre,et al.  A Dynamic Oracle for Arc-Eager Dependency Parsing , 2012, COLING.

[197]  Yi Yang,et al.  Part-of-Speech Tagging for Historical English , 2016, NAACL.

[198]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[199]  Marie-Francine Moens,et al.  Bilingual Word Embeddings from Non-Parallel Document-Aligned Data Applied to Bilingual Lexicon Induction , 2015, ACL.

[200]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[201]  Philip Resnik,et al.  Bootstrapping parsers via syntactic projection across parallel texts , 2005, Natural Language Engineering.

[202]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[203]  Brendan T. O'Connor,et al.  Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments , 2010, ACL.

[204]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[205]  Christopher D. Manning,et al.  Better Word Representations with Recursive Neural Networks for Morphology , 2013, CoNLL.

[206]  Michele Banko,et al.  Part-of-Speech Tagging in Context , 2004, COLING.

[207]  Guillaume Lample,et al.  Massively Multilingual Word Embeddings , 2016, ArXiv.

[208]  Veronika Laippala,et al.  Universal Dependencies 1.4 , 2015 .

[209]  Alexander M. Rush,et al.  Sequence-to-Sequence Learning as Beam-Search Optimization , 2016, EMNLP.

[210]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[211]  Martin Karafiát,et al.  The language-independent bottleneck features , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[212]  Wanxiang Che,et al.  Exploiting Multi-typed Treebanks for Parsing with Deep Multi-task Learning , 2016, ArXiv.

[213]  Daniel Zeman,et al.  HamleDT: To Parse or Not to Parse? , 2012, LREC.

[214]  Mauro Cettolo,et al.  WIT3: Web Inventory of Transcribed and Translated Talks , 2012, EAMT.

[215]  Jakob Uszkoreit,et al.  Cross-lingual Word Clusters for Direct Transfer of Linguistic Structure , 2012, NAACL.

[216]  Joakim Nivre,et al.  Integrating Graph-Based and Transition-Based Dependency Parsers , 2008, ACL.

[217]  Stephen Clark,et al.  A Tale of Two Parsers: Investigating and Combining Graph-based and Transition-based Dependency Parsing , 2008, EMNLP.

[218]  Olivier Pietquin,et al.  Listen and Translate: A Proof of Concept for End-to-End Speech-to-Text Translation , 2016, NIPS 2016.

[219]  Quoc V. Le,et al.  Listen, attend and spell: A neural network for large vocabulary conversational speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[220]  Mirella Lapata,et al.  A Comparison of Vector-based Representations for Semantic Composition , 2012, EMNLP.

[221]  Jason Eisner,et al.  The Galactic Dependencies Treebanks: Getting More Data by Synthesizing New Languages , 2016, TACL.

[222]  Mark Steedman,et al.  Two Decades of Unsupervised POS Induction: How Far Have We Come? , 2010, EMNLP.

[223]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[224]  Thomas Eckart,et al.  Building Large Monolingual Dictionaries at the Leipzig Corpora Collection: From 100 to 200 Languages , 2012, LREC.

[225]  Jonathan Pool,et al.  PanLex: Building a Resource for Panlingual Lexical Translation , 2014, LREC.

[226]  Hynek Hermansky,et al.  Multilingual MLP features for low-resource LVCSR systems , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[227]  Nii O. Attoh-Okine,et al.  Analysis of learning rate and momentum term in backpropagation neural network algorithm trained to predict pavement performance , 1999 .

[228]  Koray Kavukcuoglu,et al.  Learning word embeddings efficiently with noise-contrastive estimation , 2013, NIPS.

[229]  D. Crystal What is language death , 2002 .

[230]  Chin-Hui Lee,et al.  Universal attribute characterization of spoken languages for automatic spoken language recognition , 2013, Comput. Speech Lang..

[231]  G. Doddington,et al.  Speaker independent digit recognition with reference frame-specific distance measures , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[232]  Kyle Gorman,et al.  Prosodylab-aligner: A tool for forced alignment of laboratory speech , 2011 .

[233]  Kevin Knight,et al.  Multi-Source Neural Translation , 2016, NAACL.

[234]  Yoav Goldberg,et al.  EM Can Find Pretty Good HMM POS-Taggers (When Given a Good Start) , 2008, ACL.

[235]  Navdeep Jaitly,et al.  Sequence-to-Sequence Models Can Directly Transcribe Foreign Speech , 2017, ArXiv.

[236]  Owen Rambow,et al.  Sentiment Analysis of Twitter Data , 2011 .

[237]  Regina Barzilay,et al.  Hierarchical Low-Rank Tensors for Multilingual Transfer Parsing , 2015, EMNLP.

[238]  Sebastian Stüker Integrating Thai grapheme based acoustic models into the ML-MIX framework - for language independent and cross-language ASR , 2008, SLTU.

[239]  Rudi C. Villing,et al.  Performance Limits for Envelope based Automatic Syllable Segmentation , 2006 .

[240]  David Jurgens,et al.  Semi-supervised Learning with Induced Word Senses for State of the Art Word Sense Disambiguation , 2016, J. Artif. Intell. Res..

[241]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[242]  Roberto Navigli,et al.  A Framework for the Construction of Monolingual and Cross-lingual Word Similarity Datasets , 2015, ACL.

[243]  Ivan Vulic Cross-Lingual Syntactically Informed Distributed Word Representations , 2017, EACL.

[244]  Bernard Mérialdo,et al.  Tagging English Text with a Probabilistic Model , 1994, CL.

[245]  Zellig S. Harris,et al.  Distributional Structure , 1954 .