Tracking dialog states using an Author-Topic based representation

Automatically translating textual documents from one language to another inevitably results in translation errors. In addition to language specificities, this automatic translation appears more difficult in the context of spoken dialogues since, for example, the language register is far from “clean speech”. Speech analytics suffer from these translation errors. To tackle this difficulty, a solution consists in mapping translations into a space of hidden topics. In the classical topic-based representation obtained from a Latent Dirichlet Allocation (LDA), distribution of words into each topic is estimated automatically. Nonetheless, the targeted classes are ignored in the particular context of a classification task. In the DSTC5 main task, this targeted class information is crucial, the main objective being to track dialog states for sub-dialog segments. For this challenge, we propose to apply an original topic-based representation for each sub-dialogue based not only on the sub-dialogue content itself (words), but also on the dialogue state related to the sub-dialogue. This original representation is based on the Author-Topic (AT) model, previously successfully applied on a different classification task. Promising results confirmed the interest of such a method, the AT model reaching performance slightly better in terms of F-measure than baseline ones given by the task's organizers.

[1]  Mohamed Morchid,et al.  Author-topic based representation of call-center conversations , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[2]  J.R. Bellegarda,et al.  Exploiting latent semantic information in statistical language modeling , 2000, Proceedings of the IEEE.

[3]  David M. Blei,et al.  Supervised Topic Models , 2007, NIPS.

[4]  Tom Minka,et al.  Expectation-Propogation for the Generative Aspect Model , 2002, UAI.

[5]  Lu Chen,et al.  A generalized rule based tracker for dialogue state tracking , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[6]  Yoshimi Suzuki,et al.  Keyword Extraction using Term-Domain Interdependence for Dictation of Radio News , 1998, COLING-ACL.

[7]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Thomas L. Griffiths,et al.  The Author-Topic Model for Authors and Documents , 2004, UAI.

[9]  Steve Young,et al.  Automatic learning of dialogue strategy using dialogue simulation and reinforcement learning , 2002 .

[10]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[11]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Rafael E. Banchs,et al.  The fifth dialog state tracking challenge , 2016, 2016 IEEE Spoken Language Technology Workshop (SLT).

[13]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[14]  Gregor Heinrich Parameter estimation for text analysis , 2009 .

[15]  Yonghong Yan,et al.  Dialog State Tracking using Conditional Random Fields , 2013, SIGDIAL Conference.

[16]  Matthew Henderson,et al.  Deep Neural Network Approach for the Dialog State Tracking Challenge , 2013, SIGDIAL Conference.

[17]  Matthew Henderson,et al.  Robust dialog state tracking using delexicalised recurrent neural networks and unsupervised adaptation , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[18]  Ramesh Nallapati,et al.  Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora , 2009, EMNLP.

[19]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[20]  Chia-Hua Ho,et al.  Recent Advances of Large-Scale Linear Classification , 2012, Proceedings of the IEEE.

[21]  Jerome R. Bellegarda,et al.  A latent semantic analysis framework for large-Span language modeling , 1997, EUROSPEECH.

[22]  V. Vapnik Pattern recognition using generalized portrait method , 1963 .

[23]  David Vandyke,et al.  Multi-domain Dialog State Tracking using Recurrent Neural Networks , 2015, ACL.

[24]  Mohamed Morchid,et al.  Improving dialogue classification using a topic space representation and a Gaussian classifier based on the decision rule , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[25]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[26]  Yonghong Yan,et al.  Markovian Discriminative Modeling for Dialog State Tracking , 2014, SIGDIAL Conference.

[27]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Analysis , 1999, UAI.