Overview of the INLI PAN at FIRE-2017 Track on Indian Native Language Identification

This overview paper describes the first shared task on Indian Native Language Identification (INLI) that was organized at FIRE 2017. Given a corpus with comments in English from various Facebook newspapers pages, the objective of the task is to identify the native language among the following six Indian languages: Bengali, Hindi, Kannada, Malayalam, Tamil, and Telugu. Altogether, 26 approaches of 13 different teams are evaluated. In this paper, we give an overview of the approaches and discuss the results that they have obtained.

[1]  Anj Foley,et al.  Learner English: A Teacher's Guide to Interference and Other Problems Second Edition [Book Review] , 2002 .

[2]  Rupal Bhargava,et al.  Bits_Pilani@INLI-FIRE-2017: Indian Native Language Identification using Deep Learning , 2017, FIRE.

[3]  Mark Dras,et al.  Exploiting Parse Structures for Native Language Identification , 2011, EMNLP.

[4]  Vlado Keselj,et al.  DalTeam@INLI-FIRE-2017: Native Language Identification using SVM with SGD Training , 2017, FIRE.

[5]  Ari Rappoport,et al.  Using Classifier Features for Studying the Effect of Native Language on the Choice of Written Second Language Words , 2007 .

[6]  Joel R. Tetreault,et al.  A Report on the 2017 Native Language Identification Shared Task , 2017, BEA@EMNLP.

[7]  Manish Shrivastava,et al.  Towards Sub-Word Level Compositions for Sentiment Analysis of Hindi-English Code Mixed Text , 2016, COLING.

[8]  Jörg Tiedemann,et al.  A Report on the DSL Shared Task 2014 , 2014, VarDial@COLING.

[9]  Hamada A. Nayel,et al.  Mangalore-University@INLI-FIRE-2017: Indian Native Language Identification using Support Vector Machines and Ensemble approach , 2017, FIRE.

[10]  Venkatesh Duppada,et al.  SeerNet@INLI-FIRE-2017: Hierarchical Ensemble for Indian Native Language Identification , 2017, FIRE.

[11]  J Bhuvana,et al.  Bharathi SSN @ INLI-FIRE-2017: SVM based approach for Indian Native Language Identification , 2017, FIRE.

[12]  Mark Dras,et al.  Contrastive Analysis and Native Language Identification , 2009, ALTA.

[13]  Graeme Hirst,et al.  Measuring Interlanguage: Native Language Identification with L1-influence Metrics , 2012, LREC.

[14]  Daniel B Lan Native Tongues, Lost and Found: Resources and Empirical Evaluations in Native Language Identification , 2012 .

[15]  S. Pham,et al.  Profiling for English Emails , 2007 .

[16]  John Yearwood,et al.  Using psycholinguistic features for profiling first language of authors , 2012, J. Assoc. Inf. Sci. Technol..

[17]  Joel R. Tetreault,et al.  A Report on the First Native Language Identification Shared Task , 2013, BEA@NAACL-HLT.

[18]  Moshe Koppel,et al.  Automatically Determining an Anonymous Author's Native Language , 2005, ISI.

[19]  Mark Dras,et al.  Exploring Adaptor Grammars for Native Language Identification , 2012, EMNLP.

[20]  Amitava Das,et al.  CMIR: A Corpus for Evaluation of Code Mixed Information Retrieval of Hindi-English Tweets , 2016, Computación y Sistemas.

[21]  Benno Stein,et al.  Overview of the 5th Author Profiling Task at PAN 2017: Gender and Language Variety Identification in Twitter , 2017, CLEF.

[22]  Liviu P. Dinu,et al.  Native Language Identification on Text and Speech , 2017, BEA@EMNLP.

[23]  Amitava Das,et al.  Collecting and Annotating Indian Social Media Code-Mixed Corpora , 2016, CICLing.

[24]  S. Malmasi Native language identification: explorations and applications , 2016 .

[25]  Laura Mayfield Tomokiyo,et al.  You’re Not From ’Round Here, Are You? Naive Bayes Detection of Non-Native Utterances , 2001, NAACL.

[26]  D. Thenmozhi,et al.  SSN_NLP@INLI-FIRE-2017: A Neural Network Approach to Indian Native Language Identification , 2017, FIRE.

[27]  Anand Kumar M,et al.  Indian Native Language Identification - INLI 2018 , 2018 .

[28]  Walt Detmar Meurers,et al.  Exploring Syntactic Features for Native Language Identification: A Variationist Perspective on Feature Encoding and Ensemble Optimization , 2014, COLING.

[29]  Sergiu Nisioi,et al.  A Corpus of Native, Non-native and Translated Texts , 2016, LREC.