Producing Monolingual and Parallel Web Corpora at the Same Time - SpiderLing and Bitextor’s Love Affair
暂无分享,去创建一个
Antonio Toral | Nikola Ljubesic | Miquel Esplà-Gomis | Filip Klubicka | Sergio Ortiz-Rojas | Sergio Ortiz Rojas | Nikola Ljubesic | Antonio Toral | M. Esplà-Gomis | Filip Klubicka
[1] Hae-Chang Rim,et al. An Empirical Study on Web Mining of Parallel Data , 2010, COLING.
[2] Ben Hutchinson,et al. Intrinsic versus Extrinsic Evaluations of Parsing Systems , 2003 .
[3] Alain Désilets,et al. WeBiText: Building Large Heterogeneous Translation Memories from Parallel Web Content , 2008, TC.
[4] Philipp Koehn,et al. Findings of the 2013 Workshop on Statistical Machine Translation , 2013, WMT@ACL.
[5] Qingsheng Zhu,et al. Mining Bilingual Data from the Web with Adaptively Learnt Patterns , 2009, ACL/IJCNLP.
[6] Alexandra Antonova,et al. Building a Web-Based Parallel Corpus and Filtering Out Machine-Translated Text , 2011, BUCC@ACL.
[7] Ying Zhang,et al. Automatic Acquisition of Chinese-English Parallel Corpus from the Web , 2006, ECIR.
[8] Philipp Koehn,et al. Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.
[9] Andy Way,et al. Domain adaptation of statistical machine translation with domain-focused web crawling , 2014, Language Resources and Evaluation.
[10] Srinivas Bangalore,et al. A Scalable Approach to Building a Parallel Corpus from the Web , 2011, INTERSPEECH.
[11] Jian-Yun Nie,et al. Parallel Web text mining for cross-language IR , 2000, RIAO.
[12] Jörg Tiedemann,et al. News from OPUS — A collection of multilingual parallel corpora with tools and interfaces , 2009 .
[13] Nikola Ljubesic,et al. Comparing two acquisition systems for automatically building an English—Croatian parallel corpus from multilingual websites , 2014, LREC.
[14] Richard M. Schwartz,et al. Fast and Robust Neural Network Joint Models for Statistical Machine Translation , 2014, ACL.
[15] Masao Utiyama,et al. Mining Parallel Texts from Mixed-Language Web Pages , 2009, MTSUMMIT.
[16] Andy Way,et al. Extrinsic evaluation of web-crawlers in machine translation: a study on Croatian-English for the tourism domain , 2014, EAMT.
[17] Silvia Bernardini,et al. The WaCky wide web: a collection of very large linguistically processed web-crawled corpora , 2009, Lang. Resour. Evaluation.
[18] Kristina Toutanova,et al. Extracting Parallel Sentences from Comparable Corpora using Document Level Alignment , 2010, NAACL.
[19] Srinivas Bangalore,et al. Harvesting Parallel Text in Multiple Languages with Limited Supervision , 2012, COLING.
[20] Dan Tufis,et al. Empirical Methods for Exploiting Parallel Texts , 2002, Lit. Linguistic Comput..
[21] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[22] Iñaki San Vicente,et al. PaCo2: A Fully Automated tool for gathering Parallel Corpora from the Web , 2012, LREC.
[23] Xiaoyi Ma,et al. BITS: a method for bilingual text search over the Web , 1999, MTSUMMIT.
[24] Jian-Yun Nie,et al. Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the Web , 1999, SIGIR '99.
[25] Philipp Koehn,et al. Dirt Cheap Web-Scale Parallel Text from the Common Crawl , 2013, ACL.
[26] Matthew G. Snover,et al. A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.
[27] Yanhui Feng,et al. Parallel Sentences Mining From The Web , 2009 .
[28] Philip Resnik,et al. Parallel strands: a preliminary investigation into mining the Web for bilingual text , 1998, AMTA.
[29] Noah A. Smith,et al. The Web as a Parallel Corpus , 2003, CL.
[30] Dragos Stefan Munteanu,et al. Improving Machine Translation Performance by Exploiting Non-Parallel Corpora , 2005, CL.
[31] Christopher D. Manning,et al. A Simple and Effective Hierarchical Phrase Reordering Model , 2008, EMNLP.
[32] Vít Suchomel,et al. Efficient Web Crawling for Large Text Corpora , 2012 .
[33] Jörg Tiedemann,et al. Parallel Data, Tools and Interfaces in OPUS , 2012, LREC.
[34] Nadir Durrani,et al. A Joint Sequence Translation Model with Integrated Reordering , 2011, ACL.
[35] Anne Schneider,et al. Comparing intrinsic and extrinsic evaluation of MT output in a dialogue system , 2010, IWSLT.
[36] Philipp Koehn,et al. Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.
[37] Mikel L. Forcada,et al. Combining Content-Based and URL-Based Heuristics to Harvest Aligned Bitexts from Multilingual Sites with Bitextor , 2010, Prague Bull. Math. Linguistics.
[38] Gregor Thurmair,et al. A modular open-source focused crawler for mining monolingual and bilingual corpora from the web , 2013, BUCC@ACL.