A novel focus encoding scheme for addressee detection in multiparty interaction using machine learning algorithms

Addressee detection is a fundamental task for seamless dialogue management and turn taking in human-agent interaction. Though addressee detection is implicit in dyadic interaction, it becomes a challenging task when more than two participants are involved. This article proposes multiple addressee detection models based on smart feature selection and focus encoding schemes. The models are trained using different machine learning and deep learning algorithms. This research work improves existing baseline accuracies for addressee prediction on two datasets. In addition, the article explores the impact of different focus encoding schemes in several addressee detection cases. Finally, an implementation strategy for addressee detection model in real-time is discussed.

[1]  A. Koller,et al.  Speech Acts: An Essay in the Philosophy of Language , 1969 .

[2]  Natasa Jovanovic,et al.  To whom it may concern : adressee identification in face-to-face meetings , 2007 .

[3]  Koichi Shinoda,et al.  Deep Learning Based Multi-modal Addressee Recognition in Visual Scenes with Utterances , 2018, IJCAI.

[4]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[5]  Nikunj C. Oza,et al.  Online Ensemble Learning , 2000, AAAI/IAAI.

[6]  Julia Hirschberg,et al.  Identifying Agreement and Disagreement in Conversational Speech: Use of Bayesian Networks to Model Pragmatic Dependencies , 2004, ACL.

[7]  E. Schegloff,et al.  A simplest systematics for the organization of turn-taking for conversation , 2015 .

[8]  Jacques M. B. Terken,et al.  Facial Orientation During Multi-party Interaction with Information Kiosks , 2003, INTERACT.

[9]  Douglas M. Hawkins,et al.  The Problem of Overfitting , 2004, J. Chem. Inf. Model..

[10]  David Traum,et al.  Evaluation of Multi-Party Reality Dialogue Interaction , 2006 .

[11]  Jean Carletta,et al.  Unleashing the killer corpus: experiences in creating the multi-everything AMI Meeting Corpus , 2007, Lang. Resour. Evaluation.

[12]  Frank Klawonn,et al.  Multi-Layer Perceptrons , 2013 .

[13]  Antonio Torralba,et al.  Where are they looking? , 2015, NIPS.

[14]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[15]  Rieks op den Akker,et al.  Towards Automatic Addressee Identification in Multi-party Dialogues , 2004, SIGDIAL Workshop.

[16]  Anton Nijholt,et al.  A corpus for studying addressing behaviour in multi-party dialogues , 2006, SIGDIAL.

[17]  Ido Dagan,et al.  context2vec: Learning Generic Context Embedding with Bidirectional LSTM , 2016, CoNLL.

[18]  Trevor Hastie,et al.  Multi-class AdaBoost ∗ , 2009 .

[19]  R.P.H. Vertegaal,et al.  Look who's talking to whom : mediating joint attention in multiparty communication and collaboration , 1998 .

[20]  Alexandre Pauchet,et al.  A Generic Machine Learning Based Approach for Addressee Detection In Multiparty Interaction , 2019, IVA.

[21]  Rieks op den Akker,et al.  Are You Being Addressed? - Real-Time Addressee Detection to Support Remote Participants in Hybrid Meetings , 2009, SIGDIAL Conference.

[22]  Carl Vogel,et al.  Modeling Collaborative Multimodal Behavior in Group Dialogues: The MULTISIMO Corpus , 2018, LREC.

[23]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[24]  A. E. Eiben,et al.  Comparing parameter tuning methods for evolutionary algorithms , 2009, 2009 IEEE Congress on Evolutionary Computation.

[25]  E. Schegloff,et al.  A simplest systematics for the organization of turn-taking for conversation , 1974 .

[26]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[27]  Rieks op den Akker,et al.  A comparison of addressee detection methods for multiparty conversations , 2009 .

[28]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[29]  Irina Rish,et al.  An empirical study of the naive Bayes classifier , 2001 .

[30]  Rodney X. Sturdivant,et al.  Applied Logistic Regression: Hosmer/Applied Logistic Regression , 2005 .

[31]  Onur Avci,et al.  1-D Convolutional Neural Networks for Signal Processing Applications , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[32]  Hung-Hsuan Huang,et al.  Identifying Utterances Addressed to an Agent in Multiparty Human-Agent Conversations , 2011, IVA.

[33]  Zhi-Hua Zhou,et al.  A k-nearest neighbor based algorithm for multi-label classification , 2005, 2005 IEEE International Conference on Granular Computing.

[34]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .

[35]  Marti A. Hearst Trends & Controversies: Support Vector Machines , 1998, IEEE Intell. Syst..

[36]  Alexandre Pauchet,et al.  AgentSlang: A New Distributed Interactive System - Current Approaches and Performance , 2014, ICAART.

[37]  J Elith,et al.  A working guide to boosted regression trees. , 2008, The Journal of animal ecology.

[38]  David R. Traum,et al.  Evaluation of Multi-party Virtual Reality Dialogue Interaction , 2004, LREC.

[39]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.