Zipporah: a Fast and Scalable Data Cleaning System for Noisy Web-Crawled Parallel Corpora
暂无分享,去创建一个
[1] Anthony Rousseau,et al. XenC: An Open-Source Tool for Data Selection in Natural Language Processing , 2013, Prague Bull. Math. Linguistics.
[2] Ming Zhou,et al. Bilingual Data Cleaning for SMT using Graph-based Random Walk , 2013, ACL.
[3] Jörg Tiedemann,et al. Parallel Data, Tools and Interfaces in OPUS , 2012, LREC.
[4] Kenneth Heafield,et al. KenLM: Faster and Smaller Language Model Queries , 2011, WMT@EMNLP.
[5] Miquel Espl,et al. Bitextor, a free/open-source software to harvest translation memories from multilingual websites , 2009 .
[6] Philipp Koehn,et al. Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.
[7] Andreas Stolcke,et al. SRILM at Sixteen: Update and Outlook , 2011 .
[8] William D. Lewis,et al. Intelligent Selection of Language Model Training Data , 2010, ACL.
[9] Lucia Specia,et al. Quality estimation for translation selection , 2014, EAMT.
[10] Noah A. Smith,et al. A Simple, Fast, and Effective Reparameterization of IBM Model 2 , 2013, NAACL.
[11] Vladimir Eidelman,et al. cdec: A Decoder, Alignment, and Learning Framework for Finite- State and Context-Free Translation Models , 2010, ACL.
[12] Shahram Khadivi,et al. Parallel Corpus Refinement as an Outlier Detection Algorithm , 2011, MTSUMMIT.
[13] Alon Lavie,et al. The CMU-Avenue French-English Translation System , 2012, WMT@NAACL-HLT.
[14] Gaël Varoquaux,et al. Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..
[15] Qun Liu,et al. Improving Statistical Machine Translation Performance by Training Data Selection and Optimization , 2007, EMNLP-CoNLL.
[16] Kevin Duh,et al. Adaptation Data Selection using Neural Language Models: Experiments in Machine Translation , 2013, ACL.
[17] Michel Simard. Clean data for training statistical MT: the case of MT contamination , 2014, AMTA.
[18] Marianna J. Martindale,et al. Class-based N-gram language difference models for data selection , 2015, IWSLT.
[19] Philipp Koehn,et al. Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.