Modeling Psychotherapy Dialogues with Kernelized Hashcode Representations: A Nonparametric Information-Theoretic Approach.

We propose a novel dialogue modeling framework, the first-ever nonparametric kernel functions based approach for dialogue modeling, which learns kernelized hashcodes as compressed text representations; unlike traditional deep learning models, it handles well relatively small datasets, while also scaling to large ones. We also derive a novel lower bound on mutual information, used as a model-selection criterion favoring representations with better alignment between the utterances of participants in a collaborative dialogue setting, as well as higher predictability of the generated responses. As demonstrated on three real-life datasets, including prominently psychotherapy sessions, the proposed approach significantly outperforms several state-of-art neural network based dialogue systems, both in terms of computational efficiency, reducing training time from days or weeks to hours, and the response quality, achieving an order of magnitude improvement over competitors in frequency of being chosen as the best model by human evaluators.

[1]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[2]  Michael Satosi Watanabe,et al.  Information Theoretical Analysis of Multivariate Correlation , 1960, IBM J. Res. Dev..

[3]  A. Beck,et al.  An inventory for measuring depression. , 1961, Archives of general psychiatry.

[4]  E. Bordin The generalizability of the psychoanalytic concept of the working alliance. , 1979 .

[5]  A. Beck,et al.  An inventory for measuring clinical anxiety: psychometric properties. , 1988, Journal of consulting and clinical psychology.

[6]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[7]  David Haussler,et al.  Convolution Kernels on Discrete Structures UCSC CRL , 1999 .

[8]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[9]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[10]  장윤희,et al.  Y. , 2003, Industrial and Labor Relations Terms.

[11]  A. Kraskov,et al.  Estimating mutual information. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[12]  Razvan C. Bunescu,et al.  Subsequence Kernels for Relation Extraction , 2005, NIPS.

[13]  Stergios B. Fotopoulos,et al.  All of Nonparametric Statistics , 2007, Technometrics.

[14]  Kristen Grauman,et al.  Kernelized locality-sensitive hashing for scalable image search , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[15]  Alan Ritter,et al.  Unsupervised Modeling of Twitter Conversations , 2010, NAACL.

[16]  Dan Klein,et al.  Faster and Smaller N-Gram Language Models , 2011, ACL.

[17]  Olivier Buisson,et al.  Random maximum margin hashing , 2011, CVPR 2011.

[18]  Kristen Grauman,et al.  Learning Binary Hash Codes for Large-Scale Image Search , 2013, Machine Learning for Computer Vision.

[19]  Dirk Hovy,et al.  A Walk-Based Semantically Enriched Tree Kernel Over Distributed Word Representations , 2013, EMNLP.

[20]  Barnabás Póczos,et al.  Generalized Exponential Concentration Inequality for Renyi Divergence Estimation , 2014, ICML.

[21]  Tom Heskes,et al.  Mutual Information Estimation with Random Forests , 2014, ICONIP.

[22]  Ke Zhai,et al.  Discovering Latent Structure in Task-Oriented Dialogues , 2014, ACL.

[23]  Aram Galstyan,et al.  Discovering Structure in High-Dimensional Data Through Correlation Explanation , 2014, NIPS.

[24]  Hongtao Lu,et al.  Locality Preserving Hashing , 2014, AAAI.

[25]  Aram Galstyan,et al.  Efficient Estimation of Mutual Information for Strongly Dependent Variables , 2014, AISTATS.

[26]  Jianfeng Gao,et al.  Deep Reinforcement Learning for Dialogue Generation , 2016, EMNLP.

[27]  Joelle Pineau,et al.  How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation , 2016, EMNLP.

[28]  Jianfeng Gao,et al.  A Diversity-Promoting Objective Function for Neural Conversation Models , 2015, NAACL.

[29]  Joelle Pineau,et al.  Generative Deep Neural Networks for Dialogue: A Short Review , 2016, ArXiv.

[30]  Aram Galstyan,et al.  Variational Information Maximization for Feature Selection , 2016, NIPS.

[31]  Olivier Marre,et al.  Relevant sparse codes with variational information bottleneck , 2016, NIPS.

[32]  Jure Leskovec,et al.  Large-scale Analysis of Counseling Conversations: An Application of Natural Language Processing to Mental Health , 2016, TACL.

[33]  David Vandyke,et al.  A Network-based End-to-End Trainable Task-oriented Dialogue System , 2016, EACL.

[34]  Marin Litoiu,et al.  Chatbots as assistants: an architectural framework , 2017, CASCON.

[35]  Daniel Jurafsky,et al.  Neural Net Models of Open-domain Discourse Coherence , 2016, EMNLP.

[36]  Alan Ritter,et al.  Adversarial Learning for Neural Dialogue Generation , 2017, EMNLP.

[37]  Denny Britz,et al.  Generating High-Quality and Informative Conversation Responses with Sequence-to-Sequence Models , 2017, EMNLP.

[38]  Joelle Pineau,et al.  Towards an Automatic Turing Test: Learning to Evaluate Dialogue Responses , 2017, ACL.

[39]  Joelle Pineau,et al.  A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues , 2016, AAAI.

[40]  Kien Hoa Ly,et al.  A fully automated conversational agent for promoting mental well-being: A pilot RCT using mixed methods , 2017, Internet interventions.

[41]  Yann Dauphin,et al.  Deal or No Deal? End-to-End Learning of Negotiation Dialogues , 2017, EMNLP.

[42]  Alexander A. Alemi,et al.  Deep Variational Information Bottleneck , 2017, ICLR.

[43]  Zhoujun Li,et al.  Sequential Match Network: A New Architecture for Multi-turn Response Selection in Retrieval-based Chatbots , 2016, ArXiv.

[44]  K. Fitzpatrick,et al.  Delivering Cognitive Behavior Therapy to Young Adults With Symptoms of Depression and Anxiety Using a Fully Automated Conversational Agent (Woebot): A Randomized Controlled Trial , 2017, JMIR mental health.

[45]  Yanjun Han,et al.  Maximum Likelihood Estimation of Functionals of Discrete Distributions , 2014, IEEE Transactions on Information Theory.

[46]  Pascal Poupart,et al.  Deep Active Learning for Dialogue Generation , 2016, *SEMEVAL.

[47]  Toshitaka Hamamura,et al.  Standalone Effects of a Cognitive Behavioral Intervention Using a Mobile Phone App on Psychological Distress and Alcohol Consumption Among Japanese Workers: Pilot Nonrandomized Controlled Trial , 2018, JMIR mental health.

[48]  Nicu Sebe,et al.  A Survey on Learning to Hash , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49]  Maxine Eskénazi,et al.  Unsupervised Discrete Sentence Representation Learning for Interpretable Neural Dialog Generation , 2018, ACL.

[50]  Aaron C. Courville,et al.  MINE: Mutual Information Neural Estimation , 2018, ArXiv.

[51]  Xiaoyu Shen,et al.  Improving Variational Encoder-Decoders in Dialogue Generation , 2018, AAAI.

[52]  Zhoujun Li,et al.  Learning Matching Models with Weak Supervision for Response Selection in Retrieval-based Chatbots , 2018, ACL.

[53]  Gunhee Kim,et al.  A Hierarchical Latent Structure for Variational Conversation Modeling , 2018, NAACL.

[54]  Xu Sun,et al.  Diversity-Promoting GAN: A Cross-Entropy Based Generative Adversarial Network for Diversified Text Generation , 2018, EMNLP.

[55]  Wei-Ying Ma,et al.  Hierarchical Recurrent Attention Network for Response Generation , 2017, AAAI.

[56]  Xuanjing Huang,et al.  Toward Diverse Text Generation with Inverse Reinforcement Learning , 2018, IJCAI.

[57]  R. Morris,et al.  Towards an Artificially Empathic Conversational Agent for Mental Health Applications: System Design and User Perceptions , 2018, Journal of medical Internet research.

[58]  Xuan Wang,et al.  Variational Autoregressive Decoder for Neural Response Generation , 2018, EMNLP.

[59]  M. de Rijke,et al.  Why are Sequence-to-Sequence Models So Dull? Understanding the Low-Diversity Problem of Chatbots , 2018, SCAI@EMNLP.

[60]  Gaurav Pandey,et al.  Exemplar Encoder-Decoder for Neural Conversation Generation , 2018, ACL.

[61]  Derek Chen,et al.  Decoupling Strategy and Generation in Negotiation Dialogues , 2018, EMNLP.

[62]  Zhaochun Ren,et al.  Hierarchical Variational Memory Network for Dialogue Generation , 2018, WWW.

[63]  Mary Czerwinski,et al.  Pocket Skills: A Conversational Mobile Web App To Support Dialectical Behavioral Therapy , 2018, CHI.

[64]  Zhe Gan,et al.  Generating Informative and Diverse Conversational Responses via Adversarial Information Maximization , 2018, NeurIPS.

[65]  Dongyan Zhao,et al.  Get The Point of My Utterance! Learning Towards Effective Responses with Multi-Head Attention Mechanism , 2018, IJCAI.

[66]  Kedhar Nath Narahari,et al.  Insights from Building an Open-Ended Conversational Agent , 2019, Proceedings of the First Workshop on NLP for Conversational AI.

[67]  Dongyan Zhao,et al.  Insufficient Data Can Also Rock! Learning to Converse Using Smaller Data with Augmentation , 2019, AAAI.

[68]  Sahil Garg,et al.  Kernelized Hashcode Representations for Relation Extraction , 2019, AAAI.

[69]  Sahil Garg,et al.  Nearly-Unsupervised Hashcode Representations for Relation Extraction , 2019, ArXiv.