Ranking emotional attributes with deep neural networks

Studies have shown that ranking emotional attributes through preference learning methods has significant advantages over conventional emotional classification/regression frameworks. Preference learning is particularly appealing for retrieval tasks, where the goal is to identify speech conveying target emotional behaviors (e.g., positive samples with low arousal). With recent advances in deep neural networks (DNNs), this study explores whether a preference learning framework relying on deep learning can outperform conventional ranking algorithms. We use a deep learning ranker implemented with the RankNet algorithm to evaluate preference between emotional sentences in terms of dimensional attributes (arousal, valence and dominance). The results show improved performance over ranking algorithms trained with support vector machine (SVM) (i.e., RankSVM). The results are significantly better than performance reported in previous work, demonstrating the potential of RankNet to retrieve speech with target emotional behaviors.

[1]  Shrikanth Narayanan,et al.  Toward Effective Automatic Recognition Systems of Emotion in Speech , 2014 .

[2]  Carlos Busso,et al.  The Cost of Dichotomizing Continuous Labels for Binary Classification Problems: Deriving a Bayesian-Optimal Classifier , 2017, IEEE Transactions on Affective Computing.

[3]  Zhigang Deng,et al.  Emotion recognition based on phoneme classes , 2004, INTERSPEECH.

[4]  Carlos Busso,et al.  MSP-IMPROV: An Acted Corpus of Dyadic Interactions to Study Emotion Perception , 2017, IEEE Transactions on Affective Computing.

[5]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[6]  Georgios N. Yannakakis,et al.  Don’t Classify Ratings of Affect; Rank Them! , 2014, IEEE Transactions on Affective Computing.

[7]  Zhigang Deng,et al.  Analysis of emotion recognition using facial expressions, speech and multimodal information , 2004, ICMI '04.

[8]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[9]  Ragini Verma,et al.  Speaker-sensitive emotion recognition via ranking: Studies on acted and spontaneous speech , 2015, Comput. Speech Lang..

[10]  J. Russell,et al.  An approach to environmental psychology , 1974 .

[11]  Carlos Busso,et al.  Using Agreement on Direction of Change to Build Rank-Based Emotion Classifiers , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[12]  Björn W. Schuller,et al.  Towards More Reality in the Recognition of Emotional Speech , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[13]  Carlos Busso,et al.  Unveiling the Acoustic Properties that Describe the Valence Dimension , 2012, INTERSPEECH.

[14]  Reza Lotfian,et al.  Retrieving Categorical Emotions Using a Probabilistic Framework to Define Preference Learning Samples , 2016, INTERSPEECH.

[15]  Reza Lotfian,et al.  Practical considerations on the use of preference learning for ranking emotional speech , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[16]  Shrikanth S. Narayanan,et al.  Support Vector Regression for Automatic Recognition of Spontaneous Emotions in Speech , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[17]  Ragini Verma,et al.  Combining Ranking and Classification to Improve Emotion Recognition in Spontaneous Speech , 2012, INTERSPEECH.

[18]  Carlos Busso,et al.  Increasing the Reliability of Crowdsourcing Evaluations Using Online Quality Assessment , 2016, IEEE Transactions on Affective Computing.

[19]  Yi-Hsuan Yang,et al.  Ranking-Based Emotion Recognition for Music Organization and Retrieval , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[20]  Björn Schuller,et al.  Opensmile: the munich versatile and fast open-source audio feature extractor , 2010, ACM Multimedia.

[21]  Reza Lotfian,et al.  Building a naturalistic emotional speech corpus by retrieving expressive behaviors from existing speech corpora , 2014, INTERSPEECH.

[22]  Mohammad Soleymani,et al.  Affective ranking of movie scenes using physiological signals and content analysis , 2008, MS '08.

[23]  Carlos Busso,et al.  Defining Emotionally Salient Regions Using Qualitative Agreement Method , 2016, INTERSPEECH.

[24]  Björn W. Schuller,et al.  The Geneva Minimalistic Acoustic Parameter Set (GeMAPS) for Voice Research and Affective Computing , 2016, IEEE Transactions on Affective Computing.

[25]  Hatice Gunes,et al.  Continuous Prediction of Spontaneous Affect from Multiple Cues and Modalities in Valence-Arousal Space , 2011, IEEE Transactions on Affective Computing.

[26]  Geoffrey Zweig,et al.  An introduction to computational networks and the computational network toolkit (invited talk) , 2014, INTERSPEECH.

[27]  Alessandro Moschitti,et al.  Learning to Rank Short Text Pairs with Convolutional Deep Neural Networks , 2015, SIGIR.

[28]  Carlos Busso,et al.  IEMOCAP: interactive emotional dyadic motion capture database , 2008, Lang. Resour. Evaluation.

[29]  J. Russell Evidence of Convergent Validity on the Dimensions of Affect , 1978 .