Determining Native Language and Deception Using Phonetic Features and Classifier Combination

For several years, the Interspeech ComParE Challenge has focused on paralinguistic tasks of various kinds. In this paper we focus on the Native Language and the Deception subchallenges of ComParE 2016, where the goal is to identify the native language of the speaker, and to recognize deceptive speech. As both tasks can be treated as classification ones, we experiment with several state-of-the-art machine learning methods (Support-Vector Machines, AdaBoost.MH and Deep Neural Networks), and also test a simple-yet-robust combination method. Furthermore, we will assume that the native language of the speaker affects the pronunciation of specific phonemes in the language he is currently using. To exploit this, we extract phonetic features for the Native Language task. Moreover, for the Deception Sub-Challenge we compensate for the highly unbalanced class distribution by instance re-sampling. With these techniques we are able to significantly outperform the baseline SVM on the unpublished test set.

[1]  Gabor Gosztolya Is AdaBoost competitive for phoneme classification? , 2014, 2014 IEEE 15th International Symposium on Computational Intelligence and Informatics (CINTI).

[2]  Okko Johannes Räsänen,et al.  Random subset feature selection in automatic recognition of developmental disorders, affective states, and level of conflict from speech , 2013, INTERSPEECH.

[3]  László Tóth,et al.  Training HMM/ANN Hybrid Speech Recognizers by Probabilistic Sampling , 2005, ICANN.

[4]  Stephen J. Cox,et al.  Native accent classification via i-vectors and speaker compensation fusion , 2013, INTERSPEECH.

[5]  Eric Lecolinet,et al.  A multi-classifier combination strategy for the recognition of handwritten cursive words , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[6]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[7]  András Beke,et al.  Automatic Laughter Detection in Spontaneous Speech Using GMM-SVM Method , 2013, TSD.

[8]  Fabio Valente,et al.  The INTERSPEECH 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism , 2013, INTERSPEECH.

[9]  Róbert Busa-Fekete,et al.  Detecting autism, emotions and social signals using adaboost , 2013, INTERSPEECH.

[10]  Róbert Busa-Fekete,et al.  Assessing the degree of nativeness and parkinson's condition using Gaussian processes and deep rectifier neural networks , 2015, INTERSPEECH.

[11]  László Tóth,et al.  CLASSIFIER COMBINATION IN SPEECH RECOGNITION , 2003 .

[12]  Sabato Marco Siniscalchi,et al.  Boosting universal speech attributes classification with deep neural network for foreign accent characterization , 2015, INTERSPEECH.

[13]  Horst Bunke,et al.  Lipreading: A classifier combination approach , 1997, Pattern Recognit. Lett..

[14]  Albert Ali Salah,et al.  Random Discriminative Projection Based Feature Selection with Application to Conflict Recognition , 2015, IEEE Signal Processing Letters.

[15]  Róbert Busa-Fekete,et al.  Detecting the intensity of cognitive and physical load using AdaBoost and deep rectifier neural networks , 2014, INTERSPEECH.

[16]  Gábor Gosztolya Conflict intensity estimation from speech using Greedy forward-backward feature selection , 2015, INTERSPEECH.

[17]  Rahul Gupta,et al.  Paralinguistic event detection from speech using probabilistic time-series smoothing and masking , 2013, INTERSPEECH.

[18]  József Dombi,et al.  Applying Representative Uninorms for Phonetic Classifier Combination , 2014, MDAI.

[19]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[20]  Balázs Kégl,et al.  MULTIBOOST: A Multi-purpose Boosting Package , 2012, J. Mach. Learn. Res..

[21]  Ah Chung Tsoi,et al.  Neural Network Classification and Prior Class Probabilities , 1996, Neural Networks: Tricks of the Trade.

[22]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[23]  Eduardo Coutinho,et al.  The INTERSPEECH 2016 Computational Paralinguistics Challenge: Deception, Sincerity & Native Language , 2016, INTERSPEECH.

[24]  Klára Vicsi,et al.  Speech Emotion Perception by Human and Machine , 2008, COST 2102 Workshop.

[25]  Tomi Kinnunen,et al.  Factors affecting i-vector based foreign accent recognition: A case study in spoken Finnish , 2015, Speech Commun..

[26]  Gábor Gosztolya,et al.  On evaluation metrics for social signal detection , 2015, INTERSPEECH.

[27]  Fabien Ringeval,et al.  The INTERSPEECH 2014 computational paralinguistics challenge: cognitive & physical load , 2014, INTERSPEECH.

[28]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[29]  Elmar Nöth,et al.  The INTERSPEECH 2015 computational paralinguistics challenge: nativeness, parkinson's & eating condition , 2015, INTERSPEECH.

[30]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[31]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[32]  Paul Deléglise,et al.  TED-LIUM: an Automatic Speech Recognition dedicated corpus , 2012, LREC.

[33]  László Tóth Phone recognition with hierarchical convolutional deep maxout networks , 2015, EURASIP J. Audio Speech Music. Process..