Native Language Identification using Stacked Generalization

Ensemble methods using multiple classifiers have proven to be the most successful approach for the task of Native Language Identification (NLI), achieving the current state of the art. However, a systematic examination of ensemble methods for NLI has yet to be conducted. Additionally, deeper ensemble architectures such as classifier stacking have not been closely evaluated. We present a set of experiments using three ensemble-based models, testing each with multiple configurations and algorithms. This includes a rigorous application of meta-classification models for NLI, achieving state-of-the-art results on three datasets from different languages. We also present the first use of statistical significance testing for comparing NLI systems, showing that our results are significantly better than the previous state of the art. We make available a collection of test set predictions to facilitate future statistical tests.

[1]  Shervin Malmasi,et al.  NLI Shared Task 2013: MQ Submission , 2013, BEA@NAACL-HLT.

[2]  Robert P. W. Duin,et al.  Limits on the majority vote accuracy in classifier fusion , 2003, Pattern Analysis & Applications.

[3]  Juan José Rodríguez Diez,et al.  A weighted voting framework for classifiers ensembles , 2012, Knowledge and Information Systems.

[4]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[5]  Christopher D. Manning Computational Linguistics and Deep Learning , 2015, Computational Linguistics.

[6]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[7]  Scott Jarvis,et al.  Maximizing Classification Accuracy in Native Language Identification , 2013, BEA@NAACL-HLT.

[8]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[9]  Mark Johnson,et al.  PCFGs, Topic Models, Adaptor Grammars and Learning Topical Collocations and the Structure of Proper Names , 2010, ACL.

[10]  Yorick Wilks,et al.  A Closer Look at Skip-gram Modelling , 2006, LREC.

[11]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[12]  T. Hastie,et al.  Classification of gene microarrays by penalized logistic regression. , 2004, Biostatistics.

[13]  Martin Chodorow,et al.  TOEFL11: A CORPUS OF NON‐NATIVE ENGLISH , 2013 .

[14]  Ludmila I. Kuncheva,et al.  A Theoretical Study on Six Classifier Fusion Strategies , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Marine Carpuat,et al.  Feature Space Selection and Combination for Native Language Identification , 2013, BEA@NAACL-HLT.

[16]  Martin Chodorow,et al.  Native Tongues, Lost and Found: Resources and Empirical Evaluations in Native Language Identification , 2012, COLING.

[17]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[18]  Mark Shea,et al.  INTERNATIONAL CORPUS OF LEARNER ENGLISH: VERSION 2 . Sylvaine Granger, Estelle Dagneaux, Fanny Meunier, and Magali Paquot (Eds.). Louvain-La-Neuve, France: Presses Universitaires de Louvain, 2009. Pp. 223. , 2011, Studies in Second Language Acquisition.

[19]  Kari Tenfjord,et al.  The "Hows" and the "Whys" of Coding Categories in a Learner Corpus (or "How and Why an Error-Tagged Learner Corpus is not 'ipso facto' One Big Comparative Fallacy") , 2006 .

[20]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[21]  Matt Post,et al.  Bayesian Tree Substitution Grammars as a Usage-based Approach , 2013, Language and speech.

[22]  Shervin Malmasi,et al.  Finnish Native Language Identification , 2014, ALTA.

[23]  Graeme Hirst,et al.  Native language detection with 'cheap' learner corpora , 2013 .

[24]  Michael Gamon,et al.  Customizing Sentiment Classifiers to New Domains: a Case Study , 2019 .

[25]  Ohad Shamir,et al.  The Power of Depth for Feedforward Neural Networks , 2015, COLT.

[26]  Emilio Corchado,et al.  A survey of multiple classifier systems as hybrid systems , 2014, Inf. Fusion.

[27]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[28]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[29]  Walt Detmar Meurers,et al.  Combining Shallow and Linguistically Motivated Features in Native Language Identification , 2013, BEA@NAACL-HLT.

[30]  Joel R. Tetreault,et al.  A Report on the First Native Language Identification Shared Task , 2013, BEA@NAACL-HLT.

[31]  John D. Burger,et al.  Discriminating Non-Native English with 350 Words , 2013, BEA@NAACL-HLT.

[32]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[33]  Kagan Tumer,et al.  Classifier ensembles: Select real-world applications , 2008, Inf. Fusion.

[34]  James C. Bezdek,et al.  Decision templates for multiple classifier fusion: an experimental comparison , 2001, Pattern Recognit..

[35]  Shervin Malmasi,et al.  Chinese Native Language Identification , 2014, EACL.

[36]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[37]  Preslav Nakov,et al.  Discriminating between Similar Languages and Arabic Dialect Identification: A Report on the Third DSL Shared Task , 2016, VarDial@COLING.

[38]  Lourdes Ortega,et al.  Understanding Second Language Acquisition , 2008 .

[39]  Thamar Solorio,et al.  Native Language Identification: a Simple n-gram Based Approach , 2013, BEA@NAACL-HLT.

[40]  Irina P. Temnikova,et al.  Norwegian Native Language Identification , 2015, RANLP.

[41]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[42]  Graeme Hirst,et al.  Robust, Lexicalized Native Language Identification , 2012, COLING.

[43]  Shervin Malmasi,et al.  Large-Scale Native Language Identification with Cross-Corpus Evaluation , 2015, NAACL.

[44]  Mark Dras,et al.  Contrastive Analysis and Native Language Identification , 2009, ALTA.

[45]  Graeme Hirst,et al.  Measuring Interlanguage: Native Language Identification with L1-influence Metrics , 2012, LREC.

[46]  F. Mosteller,et al.  Inference and Disputed Authorship: The Federalist , 1966 .

[47]  R. Polikar,et al.  Ensemble based systems in decision making , 2006, IEEE Circuits and Systems Magazine.

[48]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[49]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[50]  Yoav Goldberg,et al.  A Primer on Neural Network Models for Natural Language Processing , 2015, J. Artif. Intell. Res..

[51]  Chengjun Liu,et al.  Gabor feature based classification using the enhanced fisher linear discriminant model for face recognition , 2002, IEEE Trans. Image Process..

[52]  Q. Mcnemar Note on the sampling error of the difference between correlated proportions or percentages , 1947, Psychometrika.

[53]  Barbora Hladká,et al.  Feature Engineering in the NLI Shared Task 2013: Charles University Submission Report , 2013, BEA@NAACL-HLT.

[54]  Lawrence O. Hall,et al.  A Comparison of Decision Tree Ensemble Creation Techniques , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[55]  Paul Meurer,et al.  The ASK Corpus - a Language Learner Corpus of Norwegian as a Second Language , 2006, LREC.

[56]  Benjamin Swanson,et al.  Native Language Detection with Tree Substitution Grammars , 2012, ACL.

[57]  Moshe Koppel,et al.  Automatically Determining an Anonymous Author's Native Language , 2005, ISI.

[58]  Shervin Malmasi,et al.  The Jinan Chinese Learner Corpus , 2015, BEA@NAACL-HLT.

[59]  Sargur N. Srihari,et al.  Decision Combination in Multiple Classifier Systems , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[60]  Johannes Fürnkranz,et al.  Round Robin Classification , 2002, J. Mach. Learn. Res..

[61]  Shervin Malmasi,et al.  Arabic Native Language Identification , 2014, ANLP@EMNLP.

[62]  Aoife Cahill,et al.  String Kernels for Native Language Identification: Insights from Behind the Curtains , 2016, CL.

[63]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[64]  Felice Dell'Orletta,et al.  Linguistic Profiling based on General-purpose Features and Native Language Identification , 2013, BEA@NAACL-HLT.

[65]  Radu Tudor Ionescu,et al.  The Story of the Characters, the DNA and the Native Language , 2013, BEA@NAACL-HLT.

[66]  Scott Jarvis,et al.  Approaching language transfer through text classification : explorations in the detection-based approach , 2012 .

[67]  Mark Dras,et al.  Exploiting Parse Structures for Native Language Identification , 2011, EMNLP.

[68]  R. Tibshirani,et al.  Diagnosis of multiple cancer types by shrunken centroids of gene expression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[69]  Brian Roark,et al.  Incremental Parsing with the Perceptron Algorithm , 2004, ACL.

[70]  Joel R. Tetreault,et al.  Oracle and Human Baselines for Native Language Identification , 2015, BEA@NAACL-HLT.

[71]  Walt Detmar Meurers,et al.  Exploring Syntactic Features for Native Language Identification: A Variationist Perspective on Feature Encoding and Ensemble Optimization , 2014, COLING.

[72]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[73]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[74]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[75]  David West,et al.  Neural network credit scoring models , 2000, Comput. Oper. Res..

[76]  Ashutosh Kumar Singh,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2010 .

[77]  Aoife Cahill,et al.  Can characters reveal your native language? A language-independent approach to native language identification , 2014, EMNLP.

[78]  Tong Zhang,et al.  Text Categorization Based on Regularized Linear Classification Methods , 2001, Information Retrieval.

[79]  David Madigan,et al.  Large-Scale Bayesian Logistic Regression for Text Categorization , 2007, Technometrics.

[80]  Shervin Malmasi,et al.  Multilingual native language identification , 2015, Natural Language Engineering.

[81]  E. Rasmussen Evaluation in Information Retrieval , 2002 .

[82]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.