Indian Native Language Identification - INLI 2018

The growth of digital platforms enables the industries to serve user specific services. Most of the time, the information of the internet users are not explicitly available and it acts as a constrain in developing the personalized applications. There comes the need for author profiling tasks, which intends to predict the internet users characteristics from their texts. Native language Identification is one among the author profiling task, that predicts the authors native language from their texts available in other language. We have proposed Indian Native Language Identification task, where the internet users texts are written in English and participants needs to find, whether the user's native language is from Tamil, Malayalam, Kannada, Telugu, Bengali and Hindi. The corpus is collected from texts from regional news paper pages available in Facebook by considering the hypothesis that the user belongs to a particular region will read the news from respective regional news paper.

[1]  R. Weisberg A-N-D , 2011 .

[2]  Martin Chodorow,et al.  Native Tongues, Lost and Found: Resources and Empirical Evaluations in Native Language Identification , 2012, COLING.

[3]  Amitava Das,et al.  CMIR: A Corpus for Evaluation of Code Mixed Information Retrieval of Hindi-English Tweets , 2016, Computación y Sistemas.

[4]  Joel R. Tetreault,et al.  A Report on the 2017 Native Language Identification Shared Task , 2017, BEA@EMNLP.

[5]  Benno Stein,et al.  Overview of the 3rd Author Profiling Task at PAN 2015 , 2015, CLEF.

[6]  Eduardo Coutinho,et al.  The INTERSPEECH 2016 Computational Paralinguistics Challenge: Deception, Sincerity & Native Language , 2016, INTERSPEECH.

[7]  Joel R. Tetreault,et al.  A Report on the First Native Language Identification Shared Task , 2013, BEA@NAACL-HLT.

[8]  Manish Shrivastava,et al.  Towards Sub-Word Level Compositions for Sentiment Analysis of Hindi-English Code Mixed Text , 2016, COLING.

[9]  Benno Stein,et al.  Overview of the Author Profiling Task at PAN 2013 , 2013, CLEF.

[10]  Sue Knight NLP at Work: The Difference That Makes a Difference in Business , 1995 .

[11]  Benno Stein,et al.  Overview of the 2 nd Author Profiling Task at PAN 2014 , 2014 .

[12]  Benno Stein,et al.  Overview of the 5th Author Profiling Task at PAN 2017: Gender and Language Variety Identification in Twitter , 2017, CLEF.

[13]  Amitava Das,et al.  Collecting and Annotating Indian Social Media Code-Mixed Corpora , 2016, CICLing.