Overview of TweetLID: Tweet Language Identification at SEPLN 2014

This article presents a summary of the TweetLID shared task and workshop held at SEPLN 2014. It briefly summarizes the data collection and annotation process, the development and evaluation of the shared task, as well as the results achieved by the participants.

[1]  Timothy Baldwin,et al.  langid.py: An Off-the-shelf Language Identification Tool , 2012, ACL.

[2]  Marcos Zampieri,et al.  Using bag-of-words to distinguish similar languages: How efficient are they? , 2013, 2013 IEEE 14th International Symposium on Computational Intelligence and Informatics (CINTI).

[3]  David Vilares,et al.  Identificación Automática del Idioma en Twitter: Adaptación de Identificadores del Estado del Arte al Contexto Ibérico , 2014, TweetLID@SEPLN.

[4]  José Ramom Pichel Campos,et al.  Comparing Ranking-based and Naive Bayes Approaches to Language Detection on Tweets , 2014, TweetLID@SEPLN.

[5]  Stefanie Nowak,et al.  Performance measures for multilabel evaluation: a case study in the area of image classification , 2010, MIR '10.

[6]  Arkaitz Zubiaga,et al.  Introducción a la Tarea Compartida Tweet-Norm 2013: Normalización Léxica de Tuits en Español , 2013, Tweet-Norm@SEPLN.

[7]  Jordi Porta,et al.  Twitter Language Identification using Rational Kernels and its potential application to Sociolinguistics , 2014, TweetLID@SEPLN.

[8]  Emilio Sanchis Arnal,et al.  ELiRF-UPV en TweetLID: Identificación del Idioma en Twitter , 2014, TweetLID@SEPLN.

[9]  Eugénio C. Oliveira,et al.  Determining language variant in microblog messages , 2013, SAC '13.

[10]  W. B. Cavnar,et al.  N-gram-based text categorization , 1994 .

[11]  Timothy Baldwin,et al.  Automatic Detection and Language Identification of Multilingual Documents , 2014, TACL.

[12]  Paul McNamee,et al.  Language identification: a solved problem suitable for undergraduate instruction , 2005 .

[13]  Anil Kumar Singh Study of Some Distance Measures for Language and Encoding Identification , 2006 .

[14]  Theresa Wilson,et al.  Language Identification for Creating Language-Specific Twitter Collections , 2012 .

[15]  Daniel Horowitz,et al.  TweetSafa: Tweet Language Identification , 2014, TweetLID@SEPLN.

[16]  Martin Majlis,et al.  Yet Another Language Identifier , 2012, EACL.

[17]  Lluís Padró,et al.  FreeLing 3.0: Towards Wider Multilinguality , 2012, LREC.

[18]  Arkaitz Zubiaga,et al.  TweetNorm_es: an annotated corpus for Spanish microtext normalization , 2014, LREC.

[19]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[20]  Wouter Weerkamp,et al.  Microblog language identification: overcoming the limitations of short, unedited and idiomatic text , 2012, Language Resources and Evaluation.

[21]  Anil Kumar Singh,et al.  A Language Identification Method Applied to Twitter Data , 2014, TweetLID@SEPLN.

[22]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[23]  Reynier Ortega Bueno,et al.  Tweets Language Identification using Feature Weighting , 2014, TweetLID@SEPLN.