The ARIEL-CMU Systems for LoReHLT18

This paper describes the ARIEL-CMU submissions to the Low Resource Human Language Technologies (LoReHLT) 2018 evaluations for the tasks Machine Translation (MT), Entity Discovery and Linking (EDL), and detection of Situation Frames in Text and Speech (SF Text and Speech).

[1]  Yiming Yang,et al.  Low-resource Cross-lingual Event Type Detection via Distant Supervision with Minimal Effort , 2018, COLING.

[2]  Chris Callison-Burch,et al.  Bucking the Trend: Large-Scale Cost-Focused Active Learning for Statistical Machine Translation , 2010, ACL.

[3]  David Chiang,et al.  Hierarchical Phrase-Based Translation , 2007, CL.

[4]  Noah A. Smith,et al.  Transition-Based Dependency Parsing with Stack Long Short-Term Memory , 2015, ACL.

[5]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[6]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[7]  Dragos Stefan Munteanu,et al.  Improving Machine Translation Performance by Exploiting Non-Parallel Corpora , 2005, CL.

[8]  Jörg Tiedemann,et al.  News from OPUS — A collection of multilingual parallel corpora with tools and interfaces , 2009 .

[9]  Satoshi Nakamura,et al.  Incorporating Discrete Translation Lexicons into Neural Machine Translation , 2016, EMNLP.

[10]  Mans Hulden,et al.  Foma: a Finite-State Compiler and Library , 2009, EACL.

[11]  Satoshi Nakamura,et al.  Selecting Syntactic, Non-redundant Segments in Active Learning for Machine Translation , 2016, NAACL.

[12]  Florian Metze,et al.  Sequence-Based Multi-Lingual Low Resource Speech Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13]  Kenneth Heafield,et al.  KenLM: Faster and Smaller Language Model Queries , 2011, WMT@EMNLP.

[14]  Noah A. Smith,et al.  A Simple, Fast, and Effective Reparameterization of IBM Model 2 , 2013, NAACL.

[15]  Matthias Sperber,et al.  XNMT: The eXtensible Neural Machine Translation Toolkit , 2018, AMTA.

[16]  Jason Weston,et al.  Learning Structured Embeddings of Knowledge Bases , 2011, AAAI.

[17]  Florian Metze,et al.  Domain Robust Feature Extraction for Rapid Low Resource ASR Development , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).

[18]  Kishore Prahallad,et al.  Automatic building of synthetic voices from large multi-paragraph speech databases , 2007, INTERSPEECH.

[19]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[20]  Siddharth Dalmia,et al.  Epitran: Precision G2P for Many Languages , 2018, LREC.

[21]  Stephen D. Mayhew,et al.  Cross-Lingual Named Entity Recognition via Wikification , 2016, CoNLL.

[22]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[23]  Alan W. Black,et al.  Random forests for statistical speech synthesis , 2015, INTERSPEECH.

[24]  Lauri Karttunen,et al.  Finite State Morphology , 2003, CSLI Studies in Computational Linguistics.

[25]  Eduard H. Hovy,et al.  End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF , 2016, ACL.

[26]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[27]  Roberto Navigli,et al.  Entity Linking meets Word Sense Disambiguation: a Unified Approach , 2014, TACL.

[28]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[29]  Hiroshi Kanayama,et al.  Learning Crosslingual Word Embeddings without Bilingual Corpora , 2016, EMNLP.

[30]  Vladimir Eidelman,et al.  cdec: A Decoder, Alignment, and Learning Framework for Finite- State and Context-Free Translation Models , 2010, ACL.

[31]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[32]  Alon Lavie,et al.  Combining Machine Translation Output with Open Source: The Carnegie Mellon Multi-Engine Machine Translation Scheme , 2010, Prague Bull. Math. Linguistics.

[33]  Edwin Lughofer,et al.  Central Moment Discrepancy (CMD) for Domain-Invariant Representation Learning , 2017, ICLR.

[34]  Sylvain Meignier,et al.  LIUM SPKDIARIZATION: AN OPEN SOURCE TOOLKIT FOR DIARIZATION , 2010 .