Predicting User Satisfaction in Spoken Dialog System Evaluation With Collaborative Filtering

We propose a collaborative filtering (CF) model to predict user satisfaction in SDS evaluation. Inspired by the use of CF in recommendation systems, where a user's preference for a new item is assume to resemble that for similar items rated previously, we adapt the idea to predict user evaluations of unrated dialogs based on the ratings received by similar dialogs. Ratings of dialogs are gathered by crowdsourcing through Amazon Mechanical Turk. A reference baseline is provided by a linear regression model (LRM) based on the PARADISE framework. We present two versions of the CF model. First, the item-based collaborative filtering model (ICFM) clusters rated dialogs and builds an LRM for each cluster. The rating of an unseen dialog is predicted by the LRM of its most similar cluster. Second, the extended ICFM (EICFM) separates dialog features into user-related and system-related groups, to build LRMs for these separately. Experimental results on dialogs from the Let's Go! system show both ICFM and EICFM can significantly improve the proportion of variability explained by the LRM. We also demonstrate the generalizability of the CF model to a new dialog corpus from the systems in the Spoken Dialog Challenge (SDC) 2010.

[1]  Carolyn Penstein Rosé,et al.  Spoken Versus Typed Human and Computer Dialogue Tutoring , 2006, Int. J. Artif. Intell. Educ..

[2]  Sophie Rosset,et al.  Predictive Performance of Dialog Systems , 2000, LREC.

[3]  Sebastian Möller,et al.  Parameters for Quantifying the Interaction with Spoken Dialogue Telephone Services , 2005, SIGDIAL.

[4]  Diane J. Litman,et al.  Modelling User Satisfaction and Student Learning in a Spoken Dialogue Tutoring System with Generic, Tutoring, and User Affect Parameters , 2006, NAACL.

[5]  Yi Zhu,et al.  Collaborative filtering model for user satisfaction prediction in Spoken Dialog System evaluation , 2010, 2010 IEEE Spoken Language Technology Workshop.

[6]  Michael F. McTear,et al.  Book Review: Spoken Dialogue Technology: Toward the Conversational User Interface, by Michael F. McTear , 2002, CL.

[7]  Greg Linden,et al.  Amazon . com Recommendations Item-to-Item Collaborative Filtering , 2001 .

[8]  Chris Callison-Burch,et al.  Cheap, Fast and Good Enough: Automatic Speech Recognition with Non-Expert Transcription , 2010, NAACL.

[9]  Jun Wang,et al.  Unifying user-based and item-based collaborative filtering approaches by similarity fusion , 2006, SIGIR.

[10]  Pak-Chung Ching,et al.  ISIS: an adaptive, trilingual conversational system with interleaving interaction and delegation dialogs , 2004, TCHI.

[11]  Niels Ole Bernsen,et al.  Evaluation and usability of multimodal spoken language dialogue systems , 2004, Speech Commun..

[12]  Marilyn A. Walker,et al.  Quantitative and Qualitative Evaluation of Darpa Communicator Spoken Dialogue Systems , 2001, ACL.

[13]  Marilyn A. Walker,et al.  PARADISE: A Framework for Evaluating Spoken Dialogue Agents , 1997, ACL.

[14]  Zhiyong Wu,et al.  A Corpus-Based Approach for Cooperative Response Generation in a Dialog System , 2006, ISCSLP.

[15]  Luo Si,et al.  An automatic weighting scheme for collaborative filtering , 2004, SIGIR '04.

[16]  Sebastian Möller,et al.  Pragmatic Usage of Linear Regression Models for the Prediction of User Judgments , 2007, SIGDIAL.

[17]  Nicole Beringer,et al.  PROMISE - A Procedure for Multimodal Interactive System Evaluation , 2002 .

[18]  Marilyn A. Walker,et al.  Towards developing general models of usability with PARADISE , 2000, Natural Language Engineering.

[19]  Yi Zhu,et al.  Collection of user judgments on spoken dialog system with crowdsourcing , 2010, 2010 IEEE Spoken Language Technology Workshop.

[20]  M. Shoukri,et al.  Measures of Interobserver Agreement , 2003 .

[21]  D. Litman,et al.  Evaluating Spoken Language Systems , 2007 .

[22]  Alexander I. Rudnicky,et al.  Multi-Site Data Collection and Evaluation in Spoken Language Understanding , 1993, HLT.

[23]  Michael F. McTear,et al.  Modelling spoken dialogues with state transition diagrams: experiences with the CSLU toolkit , 1998, ICSLP.

[24]  Marilyn A. Walker,et al.  Evaluating spoken dialogue agents with PARADISE: Two case studies , 1998, Comput. Speech Lang..

[25]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[26]  Taghi M. Khoshgoftaar,et al.  A Survey of Collaborative Filtering Techniques , 2009, Adv. Artif. Intell..

[27]  Victor Zue,et al.  Data collection and performance evaluation of spoken dialogue systems: the MIT experience , 2000, INTERSPEECH.

[28]  Robert Graham,et al.  Subjective assessment of speech-system interface usability , 2001, INTERSPEECH.

[29]  Gediminas Adomavicius,et al.  Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions , 2005, IEEE Transactions on Knowledge and Data Engineering.

[30]  Sebastian Möller,et al.  Quantifying the impact of system characteristics on perceived quality dimensions of a spoken dialogue service , 2003, INTERSPEECH.

[31]  John Riedl,et al.  Item-based collaborative filtering recommendation algorithms , 2001, WWW '01.

[32]  Maxine Eskénazi,et al.  Spoken Dialog Challenge 2010 , 2010, 2010 IEEE Spoken Language Technology Workshop.

[33]  Paul R. Cohen,et al.  Empirical methods for artificial intelligence , 1995, IEEE Expert.

[34]  Sebastian Möller,et al.  Modeling User Satisfaction with Hidden Markov Models , 2009, SIGDIAL Conference.

[35]  Maxine Eskénazi,et al.  Let's go public! taking a spoken dialog system to the real world , 2005, INTERSPEECH.