TweetLID: a benchmark for tweet language identification
暂无分享,去创建一个
Arkaitz Zubiaga | Iñaki Alegria | José Ramom Pichel Campos | Pablo Gamallo | Víctor Fresno-Fernández | Aitzol Ezeiza | Nora Aranberri | Iñaki San Vicente | Pablo Gamallo | A. Zubiaga | Víctor Fresno-Fernández | Nora Aranberri | I. Alegria | A. Ezeiza
[1] Stefan Riezler,et al. Twitter Translation using Translation-Based Cross-Lingual Retrieval , 2012, WMT@NAACL-HLT.
[2] Chepovskiy Andrey,et al. Language identification for texts written in transliteration , 2012 .
[3] Marcos Zampieri,et al. Using bag-of-words to distinguish similar languages: How efficient are they? , 2013, 2013 IEEE 14th International Symposium on Computational Intelligence and Informatics (CINTI).
[4] Mário J. Silva,et al. Language identification in web pages , 2005, SAC '05.
[5] Kavi Narayana Murthy,et al. Language identification from small text samples* , 2006, J. Quant. Linguistics.
[6] Harald Hammarstr-om. A Fine-Grained Model for Language Identification , 2007 .
[7] Viviana Mascardi,et al. Statistical Language Identification of Short Texts , 2011, ICAART.
[8] Ben King,et al. Labeling the Languages of Words in Mixed-Language Documents using Weakly Supervised Methods , 2013, NAACL.
[9] Carol Myers-Scotton,et al. Contact Linguistics: Bilingual encounters and grammatical outcomes , 2013 .
[10] Philipp Koehn,et al. Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.
[11] Reynier Ortega Bueno,et al. Tweets Language Identification using Feature Weighting , 2014, TweetLID@SEPLN.
[12] Joel R. Tetreault,et al. A Report on the First Native Language Identification Shared Task , 2013, BEA@NAACL-HLT.
[13] Lluís Padró,et al. Comparing methods for language identification , 2004, Proces. del Leng. Natural.
[14] Owen Rambow,et al. Sentiment Analysis of Twitter Data , 2011 .
[15] Timothy Baldwin,et al. Automatic Detection and Language Identification of Multilingual Documents , 2014, TACL.
[16] Anil Kumar Singh,et al. A Language Identification Method Applied to Twitter Data , 2014, TweetLID@SEPLN.
[17] Ralf D. Brown,et al. Finding and identifying text in 900+ languages , 2012, Digit. Investig..
[18] W. B. Cavnar,et al. N-gram-based text categorization , 1994 .
[19] Daniel Horowitz,et al. TweetSafa: Tweet Language Identification , 2014, TweetLID@SEPLN.
[20] David Vilares,et al. Identificación Automática del Idioma en Twitter: Adaptación de Identificadores del Estado del Arte al Contexto Ibérico , 2014, TweetLID@SEPLN.
[21] Fabrizio Sebastiani,et al. Machine learning in automated text categorization , 2001, CSUR.
[22] Martin Majlis,et al. Yet Another Language Identifier , 2012, EACL.
[23] Thomas Gottron,et al. A Comparison of Language Identification Approaches on Short, Query-Style Texts , 2010, ECIR.
[24] Kenneth R. Beesley,et al. Language Identifier: A Computer Program for Automatic Natural-Language Identification of On-line Tex , 1988 .
[25] Heng Ji,et al. Analysis and Enhancement of Wikification for Microblogs with Context Expansion , 2012, COLING.
[26] Timothy Baldwin,et al. Language Identification: The Long and the Short of the Matter , 2010, NAACL.
[27] Timothy Baldwin,et al. Cross-domain Feature Selection for Language Identification , 2011, IJCNLP.
[28] Ming-Wei Chang,et al. To Link or Not to Link? A Study on End-to-End Tweet Entity Linking , 2013, NAACL.
[29] Ioannis Pitas,et al. Language identification in web documents using discrete HMMs , 2004, Pattern Recognit..
[30] Andrew McCallum,et al. Generalized expectation criteria for lightly supervised learning , 2011 .
[31] Ralf D. Brown,et al. Selecting and Weighting N-Grams to Identify 1100 Languages , 2013, TSD.
[32] Lluís Padró,et al. FreeLing 3.0: Towards Wider Multilinguality , 2012, LREC.
[33] Wouter Weerkamp,et al. Microblog language identification: overcoming the limitations of short, unedited and idiomatic text , 2012, Language Resources and Evaluation.
[34] Max Kaufmann. Syntactic Normalization of Twitter Messages , 2010 .
[35] Theresa Wilson,et al. Language Identification for Creating Language-Specific Twitter Collections , 2012 .
[36] Radim Rehurek,et al. Language Identification on the Web: Extending the Dictionary Method , 2009, CICLing.
[37] Dong Nguyen,et al. Word Level Language Identification in Online Multilingual Communication , 2013, EMNLP.
[38] Eugénio C. Oliveira,et al. Determining language variant in microblog messages , 2013, SAC '13.
[39] Arkaitz Zubiaga,et al. TweetNorm_es: an annotated corpus for Spanish microtext normalization , 2014, LREC.
[40] Frederick Jelinek,et al. Statistical methods for speech recognition , 1997 .
[41] Timothy Baldwin,et al. langid.py: An Off-the-shelf Language Identification Tool , 2012, ACL.
[42] Fei Xia,et al. Language ID in the Context of Harvesting Language Data off the Web , 2009, EACL.
[43] Brendan T. O'Connor,et al. TweetMotif: Exploratory Search and Topic Summarization for Twitter , 2010, ICWSM.
[44] Neny Isharyanti,et al. Code-switching and code-mixing in Internet chatting: between 'yes', 'ya', and 'si'-a case study , 2009 .
[45] Cédrick Fairon,et al. Building and Exploring Web Corpora. Proceedings of the 3rd web as corpus workshop, incorporating cleaneval , 2007 .
[46] Kevin P. Scannell. The Crúbadán Project: Corpus building for under-resourced languages , 2007 .
[47] Penelope Sibun,et al. Language Determination: Natural Language Processing from Scanned Document Images , 1994, ANLP.
[48] Paul McNamee,et al. Language identification: a solved problem suitable for undergraduate instruction , 2005 .
[49] Monika Henzinger,et al. Web page language identification based on URLs , 2008, Proc. VLDB Endow..
[50] Anil Kumar Singh. Study of Some Distance Measures for Language and Encoding Identification , 2006 .
[51] Marc Najork,et al. Boot-Strapping Language Identifiers for Short Colloquial Postings , 2013, ECML/PKDD.
[52] Julio Gonzalo,et al. Towards real-time summarization of scheduled events from twitter streams , 2012, HT '12.
[53] John C. Paolillo. "Conversational" Codeswitching on Usenet and Internet Relay Chat , 2011 .
[54] Jordi Porta,et al. Twitter Language Identification using Rational Kernels and its potential application to Sociolinguistics , 2014, TweetLID@SEPLN.
[55] Timothy Baldwin,et al. Accurate Language Identification of Twitter Messages , 2014 .
[56] Timothy Baldwin,et al. Multilingual Language Identification: ALTW 2010 Shared Task Data , 2010, ALTA.
[57] N. Mikelic,et al. Language Indentification: How to Distinguish Similar Languages? , 2007, 2007 29th International Conference on Information Technology Interfaces.
[58] Gen-ichiro Kikui,et al. Identifying the Coding System and Language of On-line Documents on the Internet , 1996, COLING.
[59] Mykola Pechenizkiy,et al. Graph-Based N-gram Language Identication on Short Texts , 2011 .
[60] John M. Prager,et al. Linguini: language identification for multilingual documents , 1999, Proceedings of the 32nd Annual Hawaii International Conference on Systems Sciences. 1999. HICSS-32. Abstracts and CD-ROM of Full Papers.
[61] Stefanie Nowak,et al. Performance measures for multilabel evaluation: a case study in the area of image classification , 2010, MIR '10.
[62] Monojit Choudhury,et al. "ye word kis lang ka hai bhai?" Testing the Limits of Word level Language Identification , 2014, ICON.
[63] Johanna D. Moore,et al. Twitter Sentiment Analysis: The Good the Bad and the OMG! , 2011, ICWSM.
[64] Bu-Sung Lee,et al. TwiNER: named entity recognition in targeted twitter stream , 2012, SIGIR '12.
[65] Timothy Baldwin,et al. Reconsidering Language Identification for Written Language Resources , 2006, LREC.
[66] Emilio Sanchis Arnal,et al. ELiRF-UPV en TweetLID: Identificación del Idioma en Twitter , 2014, TweetLID@SEPLN.
[67] Raphaël Troncy,et al. Analysis of named entity recognition and linking for tweets , 2014, Inf. Process. Manag..
[68] Tommi Vatanen,et al. Language Identification of Short Text Segments with N-gram Models , 2010, LREC.
[69] Ted E. Dunning,et al. Statistical Identification of Language , 1994 .
[70] Arkaitz Zubiaga,et al. Overview of TweetLID: Tweet Language Identification at SEPLN 2014 , 2014, TweetLID@SEPLN.
[71] José Ramom Pichel Campos,et al. Comparing Ranking-based and Naive Bayes Approaches to Language Detection on Tweets , 2014, TweetLID@SEPLN.