Comparing Speech and Text Classification on ICNALE

In this paper we explore and compare a speech and text classification approach on a corpus of native and non-native English speakers. We experiment on a subset of the International Corpus Network of Asian Learners of English containing the recorded speeches and the equivalent text transcriptions. Our results suggest a high correlation between the spoken and written classification results, showing that native accent is highly correlated with grammatical structures found in text.

[1]  Moshe Koppel,et al.  Determining an author's native language by mining a text for errors , 2005, KDD '05.

[2]  Shin'ichiro Ishikawa Design of the ICNALE-Spoken : A New Database for Multi-modal Contrastive Interlanguage Analysis , 2014 .

[3]  Rajend Mesthrie,et al.  A Handbook of Varieties of English , 2004 .

[4]  Eric Kellerman,et al.  Crosslinguistic influence in second language acquisition , 1986 .

[5]  Danielle S. McNamara,et al.  Handbook of latent semantic analysis , 2007 .

[6]  Chih-Jen Lin,et al.  A Comparison of Optimization Methods and Software for Large-scale L1-regularized Linear Classification , 2010, J. Mach. Learn. Res..

[7]  H. Giles,et al.  An intergroup approach to second language acquisition , 1982 .

[8]  Chih-Jen Lin,et al.  Probability Estimates for Multi-class Classification by Pairwise Coupling , 2003, J. Mach. Learn. Res..

[9]  Sergiu Nisioi,et al.  Feature Analysis for Native Language Identification , 2015, CICLing.

[10]  Joel R. Tetreault,et al.  A Report on the First Native Language Identification Shared Task , 2013, BEA@NAACL-HLT.

[11]  Shin'ichiro Ishikawa,et al.  The ICNALE and Sophisticated Contrastive Interlanguage Analysis of Asian Learners of English , 2013 .

[12]  Scott Jarvis,et al.  Detecting L2 Writers’ L1s on the Basis of their Lexical Styles , 2012 .

[13]  Ernest Fokoué,et al.  A Comparison of Classifiers in Performing Speaker Accent Recognition Using MFCCs , 2014, ArXiv.

[14]  Venu Govindaraju,et al.  Accent classification in speech , 2005, Fourth IEEE Workshop on Automatic Identification Advanced Technologies (AutoID'05).

[15]  Steve J. Young,et al.  Language learning based on non-native speech recognition , 1997, EUROSPEECH.

[16]  John H. L. Hansen,et al.  Language accent classification in American English , 1996, Speech Commun..

[17]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[18]  Graeme Hirst,et al.  Robust, Lexicalized Native Language Identification , 2012, COLING.

[19]  H. Giles Towards a theory of language in ethnic group relations , 1977 .

[20]  Isabel Trancoso,et al.  A nativeness classifier for TED Talks , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[21]  Pascale Fung,et al.  Fast accent identification and accented speech recognition , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[22]  Björn W. Schuller,et al.  The INTERSPEECH 2010 paralinguistic challenge , 2010, INTERSPEECH.

[23]  Moshe Koppel,et al.  Translationese and Its Dialects , 2011, ACL.