Detecting the intensity of cognitive and physical load using AdaBoost and deep rectifier neural networks

The Interspeech ComParE 2014 Challenge consists of two machine learning tasks, which have quite a small number of examples. Due to our good results in ComParE 2013, we considered AdaBoost a suitable machine learning meta-algorithm for these tasks, besides we also experimented with Deep Rectifier Neural Networks. These differ from traditional neural networks in that the former have several hidden layers, and use rectifier neurons as hidden units. With AdaBoost we achieved competitive results, whereas with the neural networks we were able to outperform baseline SVM scores in both Sub-Challenges. Index Terms :s peech technology, AdaBoost, deep neural networks, rectifier activation function

[1]  Balázs Kégl,et al.  Boosting products of base classifiers , 2009, ICML '09.

[2]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[3]  Róbert Busa-Fekete,et al.  Detecting autism, emotions and social signals using adaboost , 2013, INTERSPEECH.

[4]  Brian Kingsbury,et al.  New types of deep neural network learning for speech recognition and related applications: an overview , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  László Tóth,et al.  Training HMM/ANN Hybrid Speech Recognizers by Probabilistic Sampling , 2005, ICANN.

[6]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[7]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[8]  David A. van Leeuwen,et al.  NIST and NFI-TNO evaluations of automatic speaker recognition , 2006, Comput. Speech Lang..

[9]  Balázs Kégl,et al.  MULTIBOOST: A Multi-purpose Boosting Package , 2012, J. Mach. Learn. Res..

[10]  Björn W. Schuller,et al.  Automatic recognition of physiological parameters in the human voice: Heart rate and skin conductance , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[11]  Fabio Valente,et al.  The INTERSPEECH 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism , 2013, INTERSPEECH.

[12]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[13]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[14]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[15]  Laurence Devillers,et al.  Detection of real-life emotions in call centers , 2005, INTERSPEECH.

[16]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[17]  Navdeep Jaitly,et al.  Application of Pretrained Deep Neural Networks to Large Vocabulary Conversational Speech Recognition , 2012 .

[18]  Dong Yu,et al.  Feature engineering in Context-Dependent Deep Neural Networks for conversational speech transcription , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[19]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[20]  László Tóth,et al.  A Comparison of Deep Neural Network Training Methods for Large Vocabulary Speech Recognition , 2013, TSD.

[21]  Dong Yu,et al.  Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[22]  Fabien Ringeval,et al.  The INTERSPEECH 2014 computational paralinguistics challenge: cognitive & physical load , 2014, INTERSPEECH.

[23]  Xiaolong Wang,et al.  Convolutional Deep Networks for Visual Data Classification , 2012, Neural Processing Letters.

[24]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[25]  Geoffrey E. Hinton,et al.  On rectified linear units for speech processing , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[26]  T. IstvánNagy,et al.  Document Classification with Deep Rectifier Neural Networks and Probabilistic Sampling , 2014, TSD.

[27]  László Tóth,et al.  Combining time- and frequency-domain convolution in convolutional neural network-based phone recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[28]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[29]  László Tóth Phone recognition with deep sparse rectifier neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[30]  B. Kégl,et al.  Fast boosting using adversarial bandits , 2010, ICML.

[31]  László Tóth Convolutional deep rectifier neural nets for phone recognition , 2013, INTERSPEECH.

[32]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.