Using Multimodal Information to Enhance Addressee Detection in Multiparty Interaction

Addressee detection is an important challenge to tackle in order to improve dialogical interactions between humans and agents. This detection, essential for turn-taking models, is a hard task in multiparty conditions. Rule based as well as statistical approaches have been explored. Statistical approaches, particularly deep learning approaches, require a huge amount of data to train. However, smart feature selection can help improve addressee detection on small datasets, particularly if multimodal information is available. In this article, we propose a statistical approach based on smart feature selection that exploits contextual and multimodal information for addressee detection. The results show that our model outperforms an existing baseline.

[1]  Hung-Hsuan Huang,et al.  Identifying Utterances Addressed to an Agent in Multiparty Human-Agent Conversations , 2011, IVA.

[2]  Rieks op den Akker,et al.  A comparison of addressee detection methods for multiparty conversations , 2009 .

[3]  Frank Klawonn,et al.  Multi-Layer Perceptrons , 2013 .

[4]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .

[5]  Antonio Torralba,et al.  Where are they looking? , 2015, NIPS.

[6]  Thomas J. Watson,et al.  An empirical study of the naive Bayes classifier , 2001 .

[7]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[8]  E. Goffman,et al.  Forms of talk , 1982 .

[9]  Rieks op den Akker,et al.  Are You Being Addressed? - Real-Time Addressee Detection to Support Remote Participants in Hybrid Meetings , 2009, SIGDIAL Conference.

[10]  Koichi Shinoda,et al.  Deep Learning Based Multi-modal Addressee Recognition in Visual Scenes with Utterances , 2018, IJCAI.

[11]  David Traum,et al.  Evaluation of Multi-Party Reality Dialogue Interaction , 2006 .

[12]  Jean Carletta,et al.  The AMI meeting corpus , 2005 .

[13]  Natasa Jovanovic,et al.  To whom it may concern : adressee identification in face-to-face meetings , 2007 .

[14]  A. E. Eiben,et al.  Comparing parameter tuning methods for evolutionary algorithms , 2009, 2009 IEEE Congress on Evolutionary Computation.

[15]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[16]  Anton Nijholt,et al.  A corpus for studying addressing behaviour in multi-party dialogues , 2006, SIGDIAL.

[17]  Eibe Frank,et al.  Logistic Model Trees , 2003, Machine Learning.

[18]  A. Koller,et al.  Speech Acts: An Essay in the Philosophy of Language , 1969 .

[19]  C. Y. Peng,et al.  An Introduction to Logistic Regression Analysis and Reporting , 2002 .

[20]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[21]  Julia Hirschberg,et al.  Identifying Agreement and Disagreement in Conversational Speech: Use of Bayesian Networks to Model Pragmatic Dependencies , 2004, ACL.

[22]  Rieks op den Akker,et al.  Towards Automatic Addressee Identification in Multi-party Dialogues , 2004, SIGDIAL Workshop.

[23]  Zhi-Hua Zhou,et al.  A k-nearest neighbor based algorithm for multi-label classification , 2005, 2005 IEEE International Conference on Granular Computing.

[24]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[25]  David R. Traum,et al.  Evaluation of Multi-party Virtual Reality Dialogue Interaction , 2004, LREC.

[26]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[27]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.