Conflict intensity estimation from speech using Greedy forward-backward feature selection

In the recent years extracting non-trivial information from audio sources has become possible. The resulting data has induced a new area in speech technology known as computational paralinguistics. A task in this area was presented at the ComParE 2013 Challenge (using the SSPNet Conflict Corpus), where the task was to determine the intensity of conflicts arising in speech recordings, based only on the audio information. Most authors approached this task by following standard paralinguistic practice, where we extract a huge number of potential features and perform the actual classification or regression process in the hope that the machine learning method applied is able to completely ignore irrelevant features. Although current stateof-the-art methods can indeed handle an overcomplete feature set, studies show that they can still be aided by feature selection. We opted for a simple greedy feature selection algorithm, by which we were able to outperform all previous scores on the SSPNet Conflict dataset, achieving a UAR score of 85.6%.

[1]  Björn W. Schuller,et al.  Affect recognition in real-life acoustic conditions - a new perspective on feature selection , 2013, INTERSPEECH.

[2]  Jieping Ye,et al.  Forward-Backward Greedy Algorithms for General Convex Smooth Functions over A Cardinality Constraint , 2013, ICML.

[3]  B. Kégl,et al.  Fast boosting using adversarial bandits , 2010, ICML.

[4]  Albert Ali Salah,et al.  Random Discriminative Projection Based Feature Selection with Application to Conflict Recognition , 2015, IEEE Signal Processing Letters.

[5]  Gosztolya Gábor Estimating the Level of Conflict Based on Audio Information Using Inverse Distance Weighting , 2014 .

[6]  Róbert Busa-Fekete,et al.  Detecting autism, emotions and social signals using adaboost , 2013, INTERSPEECH.

[7]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[8]  Fabio Valente,et al.  The INTERSPEECH 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism , 2013, INTERSPEECH.

[9]  Mireille Avigal,et al.  Supervector Dimension Reduction for Efficient Speaker Age Estimation Based on the Acoustic Speech Signal , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[11]  Shiqing Zhang,et al.  Feature selection filtering methods for emotion recognition in Chinese speech signal , 2008, 2008 9th International Conference on Signal Processing.

[12]  Björn W. Schuller,et al.  CCA based feature selection with application to continuous depression recognition from acoustic speech features , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13]  Paul E. Spector,et al.  Development of four self-report measures of job stressors and strain: Interpersonal Conflict at Work Scale, Organizational Constraints Scale, Quantitative Workload Inventory, and Physical Symptoms Inventory. , 1998, Journal of occupational health psychology.

[14]  Fabio Valente,et al.  Predicting Continuous Conflict Perceptionwith Bayesian Gaussian Processes , 2014, IEEE Transactions on Affective Computing.

[15]  Albert Ali Salah,et al.  Canonical correlation analysis and local fisher discriminant analysis based multi-view acoustic feature reduction for physical load prediction , 2014, INTERSPEECH.

[16]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[17]  Klára Vicsi,et al.  Speech Emotion Perception by Human and Machine , 2008, COST 2102 Workshop.

[18]  Björn Schuller,et al.  Opensmile: the munich versatile and fast open-source audio feature extractor , 2010, ACM Multimedia.

[19]  Mátyás Brendel,et al.  A quick sequential forward floating feature selection algorithm for emotion detection from speech , 2010, INTERSPEECH.

[20]  Lluís A. Belanche Muñoz,et al.  Feature selection algorithms: a survey and experimental evaluation , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[21]  Richard Bellman,et al.  Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.

[22]  Róbert Busa-Fekete,et al.  Detecting the intensity of cognitive and physical load using AdaBoost and deep rectifier neural networks , 2014, INTERSPEECH.

[23]  Shalabh Statistical Learning from a Regression Perspective , 2009 .

[24]  Luis Villaseñor Pineda,et al.  Bilingual acoustic feature selection for emotion estimation using a 3D continuous model , 2011, Face and Gesture 2011.

[25]  Josef Kittler,et al.  Pattern recognition : a statistical approach , 1982 .

[26]  Fabien Ringeval,et al.  The INTERSPEECH 2014 computational paralinguistics challenge: cognitive & physical load , 2014, INTERSPEECH.

[27]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[28]  Jeff A. Bilmes,et al.  Classification of developmental disorders from speech signals using submodular feature selection , 2013, INTERSPEECH.

[29]  Theodoros Kostoulas,et al.  Enhancing Emotion Recognition from Speech through Feature Selection , 2010, TSD.

[30]  Andrew Rosenberg,et al.  Let me finish: automatic conflict detection using speaker overlap , 2013, INTERSPEECH.

[31]  Okko Johannes Räsänen,et al.  Random subset feature selection in automatic recognition of developmental disorders, affective states, and level of conflict from speech , 2013, INTERSPEECH.