Mining Parallel Corpora from Sina Weibo and Twitter
暂无分享,去创建一个
[1] Timothy Baldwin,et al. Automatic Detection and Language Identification of Multilingual Documents , 2014, TACL.
[2] Fabienne Braune,et al. Improved Unsupervised Sentence Alignment for Symmetrical and Asymmetrical Parallel Corpora , 2010, COLING.
[3] Kenneth Ward Church,et al. A Program for Aligning Sentences in Bilingual Corpora , 1993, CL.
[4] Chris Callison-Burch,et al. Crowdsourcing Translation: Professional Quality from Non-Professionals , 2011, ACL.
[5] Stefan Riezler,et al. Twitter Translation using Translation-Based Cross-Lingual Retrieval , 2012, WMT@NAACL-HLT.
[6] Benjamin Van Durme,et al. Mining Parenthetical Translations from the Web by Word Alignment , 2008, ACL.
[7] Thomas Gottron,et al. A Comparison of Language Identification Approaches on Short, Query-Style Texts , 2010, ECIR.
[8] Alexander M. Fraser,et al. Improved Machine Translation Performance via Parallel Sentence Extraction from Comparable Corpora , 2004, NAACL.
[9] Kristina Toutanova,et al. Extracting Parallel Sentences from Comparable Corpora using Document Level Alignment , 2010, NAACL.
[10] Jakob Uszkoreit,et al. Large Scale Parallel Document Mining for Machine Translation , 2010, COLING.
[11] Matthias Eck,et al. Extracting translation pairs from social network content , 2014, IWSLT.
[12] Timothy Baldwin,et al. Lexical Normalisation of Short Text Messages: Makn Sens a #twitter , 2011, ACL.
[13] Noah A. Smith,et al. The Web as a Parallel Corpus , 2003, CL.
[14] Noah A. Smith,et al. A Dependency Parser for Tweets , 2014, EMNLP.
[15] Wang Ling,et al. Microblogs as Parallel Corpora , 2013, ACL.
[16] Robert L. Mercer,et al. The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.
[17] Marc A. Zissman,et al. Comparison of : Four Approaches to Automatic Language Identification of Telephone Speech , 2004 .
[18] Wang Ling,et al. Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation , 2015, EMNLP.
[19] Oren Etzioni,et al. Open domain event extraction from twitter , 2012, KDD.
[20] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[21] Wang Ling,et al. Crowdsourcing High-Quality Parallel Data Extraction from Twitter , 2014, WMT@ACL.
[22] Danah Boyd,et al. I tweet honestly, I tweet passionately: Twitter users, context collapse, and the imagined audience , 2011, New Media Soc..
[23] Daniel Marcu,et al. Statistical Phrase-Based Translation , 2003, NAACL.
[24] Jinxi Xu,et al. Evaluating a probabilistic model for cross-lingual information retrieval , 2001, SIGIR '01.
[25] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..
[26] Noah A. Smith,et al. A Simple, Fast, and Effective Reparameterization of IBM Model 2 , 2013, NAACL.
[27] Matt Post,et al. Constructing Parallel Corpora for Six Indian Languages via Crowdsourcing , 2012, WMT@NAACL-HLT.
[28] Simon J. Greenhill. Levenshtein Distances Fail to Identify Language Relationships Accurately , 2011, CL.
[29] Brendan T. O'Connor,et al. Improved Part-of-Speech Tagging for Online Conversational Text with Word Clusters , 2013, NAACL.
[30] Dragos Stefan Munteanu,et al. Improving Machine Translation Performance by Exploiting Non-Parallel Corpora , 2005, CL.
[31] Wang Ling,et al. Paraphrasing 4 Microblog Normalization , 2013, EMNLP.
[32] Bo Li,et al. Mining Chinese-English Parallel Corpora from the Web , 2008, IJCNLP.
[33] Maria Leonor Pacheco,et al. of the Association for Computational Linguistics: , 2001 .
[34] Brendan T. O'Connor,et al. Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments , 2010, ACL.
[35] Jaime G. Carbonell,et al. Collaborative workflow for crowdsourcing translation , 2012, CSCW.
[36] Takashi Chikayama,et al. A Fast and Accurate Method for Detecting English-Japanese Parallel Texts , 2006 .
[37] Hermann Ney,et al. Sentence segmentation using IBM word alignment model 1 , 2005, EAMT.
[38] Joel D. Martin,et al. Improving Translation Quality by Discarding Most of the Phrasetable , 2007, EMNLP.
[39] Stephan Vogel,et al. Can Crowds Build parallel corpora for Machine Translation Systems? , 2010, Mturk@HLT-NAACL.
[40] Theresa Wilson,et al. Language Identification for Creating Language-Specific Twitter Collections , 2012 .
[41] Ian H. Witten,et al. The WEKA data mining software: an update , 2009, SKDD.
[42] Nanyun Peng,et al. Learning Polylingual Topic Models from Code-Switched Social Media Documents , 2014, ACL.
[43] Jimmy J. Lin,et al. Why Not Grab a Free Lunch? Mining Large Corpora for Parallel Sentences to Improve Translation Modeling , 2012, NAACL.
[44] Franz Josef Och,et al. Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.
[45] William Yang Wang,et al. Dependency Parsing for Weibo: An Efficient Probabilistic Logic Programming Approach , 2014, EMNLP.