Emotion classification using a CNN_LSTM-based model for smooth emotional synchronization of the humanoid robot REN-XIN

In this paper, we propose an Emotional Trigger System to impart an automatic emotion expression ability within the humanoid robot REN-XIN, in which the Emotional Trigger is an emotion classification model trained from our proposed Word Mover’s Distance(WMD) based algorithm. Due to the long time delay of the WMD-based Emotional Trigger System, we propose an enhanced Emotional Trigger System to enable a smooth interaction with the robot in which the Emotional Trigger is replaced by a conventional convolution neural network and a long short term memory network (CNN_LSTM)-based deep neural network. In our experiments, the CNN_LSTM based model only need 10 milliseconds or less to finish the classification without a decrease in accuracy, while the WMD-based model needed approximately 6-8 seconds to give a result. In this paper, the experiments are conducted based on the same sub-data sets of the Chinese emotional corpus(Ren_CECps) used in former WMD experiments: one comprises 50% data for training and 50% for testing(1v1 experiment), and the other comprises 80% data for training and 20% for testing(4v1 experiment). The experiments are conducted using WMD, CNN_LSTM, CNN and LSTM. The results show that CNN_LSTM obtains the best F1 score (0.35) in the 1v1 experiment and almost the same accuracy of F1 scores (0.366 vs 0.367) achieved by WMD in the 4v1 experiment. Finally, we present demonstration videos with the same scenario to show the performance of robot control driven by CNN_LSTM-based Emotional Trigger System and WMD-based Emotional Trigger System. To improve the comparison, total manual-control performance is also recorded.

[1]  Zhong Huang,et al.  Automatic Facial Expression Learning Method Based on Humanoid Robot XIN-REN , 2016, IEEE Transactions on Human-Machine Systems.

[2]  Joseph Needham,et al.  Science and Civilization in China , 1955 .

[3]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[4]  Fuji Ren,et al.  Emotion computing using Word Mover’s Distance features based on Ren_CECps , 2018, PloS one.

[5]  Ido Dagan,et al.  Synthesis Lectures on Human Language Technologies , 2009 .

[6]  Xiaoyan Zhu,et al.  Emotional Chatting Machine: Emotional Conversation Generation with Internal and External Memory , 2017, AAAI.

[7]  Saif Mohammad,et al.  #Emotional Tweets , 2012, *SEMEVAL.

[8]  Yang Yang,et al.  Facial expression on robot SHFR-III based on head-neck coordination , 2015, 2015 IEEE International Conference on Information and Automation.

[9]  Lei Zhang,et al.  Sentiment Analysis and Opinion Mining , 2017, Encyclopedia of Machine Learning and Data Mining.

[10]  Michael Werman,et al.  Fast and robust Earth Mover's Distances , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[11]  Fabio Crestani,et al.  CLEF 2017 eRisk Overview: Early Risk Prediction on the Internet: Experimental Foundations , 2017, CLEF.

[12]  Marc J. Sheehan The Library at Alexandria , 1989 .

[13]  Yuanliu Liu,et al.  Video-based emotion recognition using CNN-RNN and C3D hybrid networks , 2016, ICMI.

[14]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[15]  Jason Brandon,et al.  Lipid phosphate phosphatase 3 regulates adipocyte sphingolipid synthesis, but not developmental adipogenesis or diet-induced obesity in mice , 2018, PloS one.

[16]  Naoyuki Kubota,et al.  Facial and gestural expression generation for robot partners , 2014, 2014 International Symposium on Micro-NanoMechatronics and Human Science (MHS).

[17]  Fabio Crestani,et al.  eRISK 2017: CLEF Lab on Early Risk Prediction on the Internet: Experimental Foundations , 2017, CLEF.

[18]  Björn W. Schuller,et al.  LSTM-Modeling of continuous emotions in an audiovisual affect recognition framework , 2013, Image Vis. Comput..

[19]  R. Goldsmith,et al.  Measuring Motivations for Online Opinion Seeking , 2006 .

[20]  Zhiyuan Liu,et al.  A C-LSTM Neural Network for Text Classification , 2015, ArXiv.

[21]  Erik Cambria,et al.  SenticNet 3: A Common and Common-Sense Knowledge Base for Cognition-Driven Sentiment Analysis , 2014, AAAI.

[22]  Bo Pang,et al.  Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[23]  Saif Mohammad,et al.  SemEval-2018 Task 1: Affect in Tweets , 2018, *SEMEVAL.

[24]  Jonghyun Choi,et al.  ActionFlowNet: Learning Motion Representation for Action Recognition , 2016, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[25]  Faisal Shafique Butt,et al.  A review on humanoid robots , 2017 .

[26]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[27]  Cristiano Premebida,et al.  Affective facial expressions recognition for human-robot interaction , 2017, 2017 26th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN).

[28]  T. Tsuji,et al.  Development of the Face Robot SAYA for Rich Facial Expressions , 2006, 2006 SICE-ICASE International Joint Conference.

[29]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[30]  Xiaojun Wan,et al.  Emotion Classification in Microblog Texts Using Class Sequential Rules , 2014, AAAI.

[31]  Klaus R. Scherer,et al.  The Difficulties in Emotion Regulation Scale (DERS) Factor Structure and Consistency of a French Translation , 2013 .

[32]  Gwen Littlewort,et al.  Real Time Face Detection and Facial Expression Recognition: Development and Applications to Human Computer Interaction. , 2003, 2003 Conference on Computer Vision and Pattern Recognition Workshop.

[33]  Satoshi Nakamura,et al.  Eliciting Positive Emotion through Affect-Sensitive Dialogue Response Generation: A Neural Network Approach , 2018, AAAI.

[34]  Changqin Quan,et al.  A blog emotion corpus for emotional expression analysis in Chinese , 2010, Comput. Speech Lang..

[35]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[36]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[37]  Karsten Berns,et al.  Control of facial expressions of the humanoid robot head ROMAN , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.