Analyse et reconnaissance des émotions lors de conversations de centres d'appels. (Automatic emotions recognition during call center conversations)

La reconnaissance automatique des emotions dans la parole est un sujet de recherche relativement recent dans le domaine du traitement de la parole, puisqu’il est aborde depuis une dizaine d’annees environs. Ce sujet fait de nos jours l’objet d’une grande attention, non seulement dans le monde academique mais aussi dans l’industrie, grâce a l’augmentation des performances et de la fiabilite des systemes. Les premiers travaux etaient fondes sur des donnes jouees par des acteurs, et donc non spontanees. Meme aujourd’hui, la plupart des etudes exploitent des sequences pre-segmentees d’un locuteur unique et non une communication spontanee entre plusieurs locuteurs. Cette methodologie rend les travaux effectues difficilement generalisables pour des informations collectees de maniere naturelle.Les travaux entrepris dans cette these se basent sur des conversations de centre d’appels, enregistres en grande quantite et mettant en jeu au minimum 2 locuteurs humains (un client et un agent commercial) lors de chaque dialogue. Notre but est la detection, via l’expression emotionnelle, de la satisfaction client. Dans une premiere partie nous presentons les scores pouvant etre obtenus sur nos donnees a partir de modeles se basant uniquement sur des indices acoustiques ou lexicaux. Nous montrons que pour obtenir des resultats satisfaisants une approche ne prenant en compte qu’un seul de ces types d’indices ne suffit pas. Nous proposons pour palier ce probleme une etude sur la fusion d’indices de types acoustiques, lexicaux et syntaxico-semantiques. Nous montrons que l’emploi de cette combinaison d’indices nous permet d’obtenir des gains par rapport aux modeles acoustiques meme dans les cas ou nous nous basons sur une approche sans pre-traitements manuels (segmentation automatique des conversations, utilisation de transcriptions fournies par un systeme de reconnaissance de la parole). Dans une seconde partie nous remarquons que meme si les modeles hybrides acoustiques/linguistiques nous permettent d’obtenir des gains interessants la quantite de donnees utilisees dans nos modeles de detection est un probleme lorsque nous testons nos methodes sur des donnees nouvelles et tres variees (49h issus de la base de donnees de conversations). Pour remedier a ce probleme nous proposons une methode d’enrichissement de notre corpus d’apprentissage. Nous selectionnons ainsi, de maniere automatique, de nouvelles donnees qui seront integrees dans notre corpus d’apprentissage. Ces ajouts nous permettent de doubler la taille de notre ensemble d’apprentissage et d’obtenir des gains par rapport aux modeles de depart. Enfin, dans une derniere partie nous choisissons d’evaluees nos methodes non plus sur des portions de dialogues comme cela est le cas dans la plupart des etudes, mais sur des conversations completes. Nous utilisons pour cela les modeles issus des etudes precedentes (modeles issus de la fusion d’indices, des methodes d’enrichissement automatique) et ajoutons 2 groupes d’indices supplementaires : i) Des indices « structurels » prenant en compte des informations comme la duree de la conversation, le temps de parole de chaque type de locuteurs. ii) des indices « dialogiques » comprenant des informations comme le theme de la conversation ainsi qu’un nouveau concept que nous nommons « implication affective ». Celui-ci a pour but de modeliser l’impact de la production emotionnelle du locuteur courant sur le ou les autres participants de la conversation. Nous montrons que lorsque nous combinons l’ensemble de ces informations nous arrivons a obtenir des resultats proches de ceux d’un humain lorsqu’il s’agit de determiner le caractere positif ou negatif d’une conversation

[1]  Kostas Karpouzis,et al.  The HUMAINE Database: Addressing the Collection and Annotation of Naturalistic and Induced Emotional Data , 2007, ACII.

[2]  Shrikanth S. Narayanan,et al.  Toward detecting emotions in spoken dialogs , 2005, IEEE Transactions on Speech and Audio Processing.

[3]  C. Pelachaud,et al.  Emotion-Oriented Systems: The Humaine Handbook , 2011 .

[4]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[5]  Vladimir Cherkassky,et al.  The Nature Of Statistical Learning Theory , 1997, IEEE Trans. Neural Networks.

[6]  Ellen Riloff,et al.  Creating Subjective and Objective Sentence Classifiers from Unannotated Texts , 2005, CICLing.

[7]  Rada Mihalcea,et al.  Co-training and Self-training for Word Sense Disambiguation , 2004, CoNLL.

[8]  Olivier Losson Modélisation du geste communicatif et réalisation d'un signeur virtuel de phrases en langue des signes grançaise , 2000 .

[9]  Lambert Schomaker,et al.  Variants of the Borda count method for combining ranked classifier hypotheses , 2000 .

[10]  K. Scherer,et al.  Minimal cues in the vocal communication of affect: Judging emotions from content-masked speech , 1972, Journal of psycholinguistic research.

[11]  K. Scherer What are emotions? And how can they be measured? , 2005 .

[12]  Levent M. Arslan,et al.  Automatic Detection of Anger in Human-Human Call Center Dialogs , 2011, INTERSPEECH.

[13]  Stacy Marsella,et al.  A domain-independent framework for modeling emotion , 2004, Cognitive Systems Research.

[14]  Laurence Vidrascu,et al.  Analyse et détection des émotions verbales dans les interactions orales. (Analysis and detection of emotions in real-life spontaneous speech) , 2007 .

[15]  Ching Y. Suen,et al.  Optimal combinations of pattern classifiers , 1995, Pattern Recognit. Lett..

[16]  Roddy Cowie,et al.  FEELTRACE: an instrument for recording perceived emotion in real time , 2000 .

[17]  Helmut Schmid,et al.  Improvements in Part-of-Speech Tagging with an Application to German , 1999 .

[18]  Tim Polzehl,et al.  Detecting real life anger , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[19]  Daniel Hirst,et al.  Prosodic parameters of French: A cross-language approach , 2001 .

[20]  Ammar Mahdhaoui,et al.  Analyse de Signaux Sociaux pour la Modélisation de l'interaction face à face. (Analysis of Social Signals for modelling Face-to-Face Interaction) , 2010 .

[21]  Laurence Devillers,et al.  Detection of real-life emotions in call centers , 2005, INTERSPEECH.

[22]  S. R. Mahadeva Prasanna,et al.  Expressive speech synthesis: a review , 2013, Int. J. Speech Technol..

[23]  Swapna Somasundaran,et al.  Manual Annotation of Opinion Categories in Meetings , 2006 .

[24]  Shlomo Argamon,et al.  Appraisal Extraction for News Opinion Analysis at NTCIR-6 , 2007, NTCIR.

[25]  Charles Goodwin,et al.  Assessments and the Construction of Context , 1992 .

[26]  Steven J. Simske,et al.  Recognition of emotions in interactive voice response systems , 2003, INTERSPEECH.

[27]  Stefan Steidl,et al.  Automatic classification of emotion related user states in spontaneous children's speech , 2009 .

[28]  Chih-Jen Lin,et al.  Combining SVMs with Various Feature Selection Strategies , 2006, Feature Extraction.

[29]  Shrikanth S. Narayanan,et al.  Politeness and frustration language in child-machine interactions , 2001, INTERSPEECH.

[30]  Cynthia Breazeal,et al.  Emotion and sociable humanoid robots , 2003, Int. J. Hum. Comput. Stud..

[31]  Laurence Devillers,et al.  Study of consumer's emotion during product interviews , 2009, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops.

[32]  Lori Lamel,et al.  Annotation and Detection of Emotion in a Task-oriented Human-Human Dialog Corpus , 2007 .

[33]  Jean-Claude Martin,et al.  Coding Emotional Events in Audiovisual Corpora , 2008, LREC.

[34]  Björn W. Schuller,et al.  The INTERSPEECH 2011 Speaker State Challenge , 2011, INTERSPEECH.

[35]  Gérard Chollet,et al.  Studies of Emotional Expressions in Oral Dialogues: towards an Extension of Universal Networking Language , 2005 .

[36]  Laurence Devillers,et al.  Annotation and detection of blended emotions in real human-human dialogs recorded in a call center , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[37]  Jean-Claude Martin,et al.  Annotation of Emotions in Real-Life Video Interviews: Variability between Coders , 2006, LREC.

[38]  Paul Boersma,et al.  Praat: doing phonetics by computer , 2003 .

[39]  Roddy Cowie,et al.  Real life emotions in French and English TV video clips: an integrated annotation protocol combining continuous and discrete approaches , 2006, LREC.

[40]  Sarkis Abrilian Représentation de comportements emotionnels multimodaux spontanés : perception, annotation et synthèse. (Representation of spontaneous multimodal emotional behaviors : perception, annotation and synthesis) , 2007 .

[41]  Sukhendu Das,et al.  A Survey of Decision Fusion and Feature Fusion Strategies for Pattern Classification , 2010, IETE Technical Review.

[42]  Björn W. Schuller,et al.  Emotion recognition using imperfect speech recognition , 2010, INTERSPEECH.

[43]  Björn W. Schuller,et al.  Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge , 2011, Speech Commun..

[44]  Loïc Kessous,et al.  Emotion Recognition through Multiple Modalities: Face, Body Gesture, Speech , 2008, Affect and Emotion in Human-Computer Interaction.

[45]  Chloé Clavel,et al.  Impact of spontaneous speech features on business concept detection: a study of call-centre data. , 2010, SSCS '10.

[46]  Björn W. Schuller,et al.  Segmenting into Adequate Units for Automatic Recognition of Emotion-Related Episodes: A Speech-Based Approach , 2010, Adv. Hum. Comput. Interact..

[47]  C. Darwin The Expression of the Emotions in Man and Animals , .

[48]  Antoine Cornuéjols,et al.  Apprentissage artificiel - Concepts et algorithmes , 2003 .

[49]  Björn W. Schuller,et al.  The INTERSPEECH 2009 emotion challenge , 2009, INTERSPEECH.

[50]  Björn W. Schuller,et al.  Using Multiple Databases for Training in Emotion Recognition: To Unite or to Vote? , 2011, INTERSPEECH.

[51]  Émile Benveniste 1970. ‘L’appareil formel de l’énonciation.’ Langages 17 (5): 12–18, selected 12–18. Anonymous translator. , 2014 .

[52]  A. Mehrabian Pleasure-arousal-dominance: A general framework for describing and measuring individual differences in Temperament , 1996 .

[53]  Björn W. Schuller,et al.  Acoustic emotion recognition: A benchmark comparison of performances , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[54]  Xavier Rodet CONTENT-BASED TRANSFORMATION OF THE EXPRESSIVITY IN SPEECH , 2007 .

[55]  Andrew Ortony,et al.  The Cognitive Structure of Emotions , 1988 .

[56]  Zhigang Deng,et al.  Analysis of emotion recognition using facial expressions, speech and multimodal information , 2004, ICMI '04.

[57]  Christopher D. Manning,et al.  Introduction to Information Retrieval: Support vector machines and machine learning on documents , 2008 .

[58]  A. Mehrabian Correlations of the PAD Emotion Scales with self-reported satisfaction in marriage and work. , 1998, Genetic, social, and general psychology monographs.

[59]  Murray S. Miron,et al.  Cross-Cultural Universals of Affective Meaning , 1975 .

[60]  Catherine Pelachaud,et al.  The HUMAINE Database , 2011 .

[61]  Matthieu Vernier,et al.  SUIVI D'OPINION DANS LE DISCOURS , 2007 .

[62]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[63]  Jean-Claude Martin,et al.  Collection and Annotation of a Corpus of Human-Human Multimodal Interactions: Emotion and Others Anthropomorphic Characteristics , 2007, ACII.

[64]  Björn W. Schuller,et al.  What Should a Generic Emotion Markup Language Be Able to Represent? , 2007, ACII.

[65]  M. Borodovsky,et al.  Gene identification in novel eukaryotic genomes by self-training algorithm , 2005, Nucleic acids research.

[66]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[67]  Björn W. Schuller,et al.  Speaker independent emotion recognition by early fusion of acoustic and linguistic features within ensembles , 2005, INTERSPEECH.

[68]  Rafael A. Calvo,et al.  Hybrid Fusion Approach for Detecting Affects from Multichannel Physiology , 2011, ACII.

[69]  Laurence Devillers,et al.  Représentation et détection des émotions dans des dialogues enregistrés dans un centre d'appel. Des émotions complexes dans des données réelles , 2006, Rev. d'Intelligence Artif..

[70]  N. Campbell,et al.  Voice Quality : the 4 th Prosodic Dimension , 2004 .

[71]  Luca Dini,et al.  Classification d'opinions par mthodes symbolique, statistique et hybride , 2007 .

[72]  Valery A. Petrushin,et al.  EMOTION IN SPEECH: RECOGNITION AND APPLICATION TO CALL CENTERS , 1999 .

[73]  Björn Schuller,et al.  Selecting Training Data for Cross-Corpus Speech Emotion Recognition: Prototypicality vs. Generalization , 2011 .

[74]  P. Laukka,et al.  Communication of emotions in vocal expression and music performance: different channels, same code? , 2003, Psychological bulletin.

[75]  M. Morel,et al.  LE ROLE DE L'INTONATION DANS LA COMMUNICATION VOCALE DES EMOTIONS : TEST PAR LA SYNTHESE , 2004 .

[76]  David Yarowsky,et al.  Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[77]  Jeannett Martin,et al.  The Language of Evaluation: Appraisal in English , 2005 .

[78]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[79]  J. Russell,et al.  An approach to environmental psychology , 1974 .

[80]  Martin Ebner,et al.  Emotion Detection: Application of the Valence Arousal Space for Rapid Biological Usability Testing to Enhance Universal Access , 2009, HCI.

[81]  K. Scherer,et al.  Emotion Inferences from Vocal Expression Correlate Across Languages and Cultures , 2001 .

[82]  Laurence Devillers,et al.  Real-life emotion-related states detection in call centers: a cross-corpora study , 2010, INTERSPEECH.

[83]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[84]  Anne Lacheret,et al.  Toward a Continuous Modeling of French Prosodic Structure: Using Acoustic Features to Predict Prominence Location and Prominence Degree , 2011, INTERSPEECH.

[85]  Klaus R. Scherer,et al.  Vocal Affect Signaling: A Comparative Approach , 1985 .

[86]  Eibe Frank,et al.  Combining Naive Bayes and Decision Tables , 2008, FLAIRS.

[87]  Roddy Cowie,et al.  Emotional speech: Towards a new generation of databases , 2003, Speech Commun..

[88]  Jen-Shin Hong,et al.  Emotion Detection in Textual Information by Semantic Role Labeling and Web Mining Techniques , 2006 .

[89]  R. Craggs Annotating emotion in dialogue – Issues and Approaches , 2003 .

[90]  Dimitrios Ververidis,et al.  A State of the Art Review on Emotional Speech Databases , 2003 .

[91]  Jean-Luc Gauvain,et al.  Multistage speaker diarization of broadcast news , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[92]  Björn W. Schuller,et al.  OpenEAR — Introducing the munich open-source emotion and affect recognition toolkit , 2009, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops.

[93]  Björn W. Schuller,et al.  The INTERSPEECH 2010 paralinguistic challenge , 2010, INTERSPEECH.

[94]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[95]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[96]  Matthieu Vernier,et al.  Catégorisation des évaluations dans un corpus de blogs multi-domaine , 2009, Fouille de Données d'Opinions.

[97]  Devillers,et al.  Automatic detection of emotion from vocal expression , 2010 .

[98]  Daniel Luzzati,et al.  Recherches sur le dialogue homme-machine. Modèles linguistiques et traitements automatiques , 1991 .

[99]  P. Ekman,et al.  Approaches To Emotion , 1985 .

[100]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[101]  Björn W. Schuller,et al.  Evolutionary Feature Generation in Speech Emotion Recognition , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[102]  Matthieu Vernier,et al.  Annotating Opinion - Evaluation Of Blogs , 2008 .

[103]  David Sander,et al.  Traité de psychologie des émotions , 2014 .

[104]  Björn W. Schuller,et al.  Towards More Reality in the Recognition of Emotional Speech , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[105]  Glenn Shafer,et al.  A Mathematical Theory of Evidence , 2020, A Mathematical Theory of Evidence.

[106]  Claire Cardie,et al.  Annotating Expressions of Opinions and Emotions in Language , 2005, Lang. Resour. Evaluation.

[107]  C. Clavel,et al.  Analyse et reconnaissance des manifestations acoustiques des émotions de type peur en situations anormales , 2007 .

[108]  Roddy Cowie,et al.  What a neural net needs to know about emotion words , 1999 .

[109]  Chung-Hsien Wu,et al.  Emotion Detection Based on Concept Inference and Spoken Sentence Analysis for Customer Service , 2011, INTERSPEECH.

[110]  Claire Cardie,et al.  Identifying Expressions of Opinion in Context , 2007, IJCAI.

[111]  L Dini,et al.  Opinion Classification Through Information Extraction , 2002 .

[112]  Jean-Luc Gauvain,et al.  CallSurf: Automatic Transcription, Indexing and Structuration of Call Center Conversational Speech for Knowledge Extraction and Query by Content , 2008, LREC.

[113]  Shlomo Argamon,et al.  Using appraisal groups for sentiment analysis , 2005, CIKM '05.

[114]  P. Ekman,et al.  Unmasking the face : a guide to recognizing emotions from facial clues , 1975 .

[115]  Arthur C. Graesser,et al.  Multimodal semi-automated affect detection from conversational cues, gross body language, and facial features , 2010, User Modeling and User-Adapted Interaction.