Syntactic Filtering and Content-Based Retrieval of Twitter Sentences for the Generation of System Utterances in Dialogue Systems

Sentences extracted from Twitter have been seen as a valuable resource for response generation in dialogue systems. However, selecting appropriate ones is difficult due to their noise. This paper proposes tackling such noise by syntactic filtering and content-based retrieval. Syntactic filtering ascertains the valid sentence structure as system utterances, and content-based retrieval ascertains that the content has the relevant information related to user utterances. Experimental results show that our proposed method can appropriately select high-quality Twitter sentences, significantly outperforming the baseline.

[1]  Alan Ritter,et al.  Data-Driven Response Generation in Social Media , 2011, EMNLP.

[2]  Ryuichiro Higashinaka,et al.  Open-domain Utterance Generation for Conversational Dialogue Systems using Web-scale Dependency Structures , 2013, SIGDIAL Conference.

[3]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[4]  Norihito Yasuda,et al.  Japanese Dependency Parsing Using Sequential Labeling for Semi-spoken Language , 2007, ACL.

[5]  Kentaro Torisawa,et al.  Automatic Discovery of Attribute Words from Web Documents , 2005, IJCNLP.

[6]  Ryuichiro Higashinaka,et al.  Controlling Listening-oriented Dialogue using Partially Observable Markov Decision Processes , 2010, COLING.

[7]  Yuji Matsumoto,et al.  A Boosting Algorithm for Classification of Semi-Structured Text , 2004, EMNLP.

[8]  Annie Louis,et al.  Summarization of Business-Related Tweets: A Concept-Based Approach , 2012, COLING.

[9]  Kenji Araki,et al.  A Casual Conversation System Using Modality and Word Associations Retrieved from the Web , 2008, EMNLP.

[10]  J. Gilbert,et al.  A COMPUTER METHOD OF PSYCHOTHERAPY: PRELIMINARY COMMUNICATION , 1966, The Journal of nervous and mental disease.

[11]  Richard S. Wallace,et al.  The Anatomy of A.L.I.C.E. , 2009 .

[12]  Shinichiro Takagi,et al.  Japanese Morphological Analyzer using Word Co-occurence -JTAG , 1998, COLING-ACL.

[13]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[14]  Kiyohiro Shikano,et al.  Construction and Optimization of a Question and Answer Database for a Real-environment Speech-oriented Guidance System , 2007 .

[15]  Yasuo Kuniyoshi,et al.  Dialog System Using Real-Time Crowdsourcing and Twitter Large-Scale Corpus , 2012, SIGDIAL Conference.

[16]  Chengqing Zong,et al.  A Structure-Based Model for Chinese Organization Name Translation , 2008, TALIP.

[17]  Ryuichiro Higashinaka,et al.  Automatically Acquiring Causal Expression Patterns from Relation-annotated Corpora to Improve Question Answering for why-Questions , 2008, TALIP.

[18]  Timothy W. Bickmore,et al.  Establishing and maintaining long-term human-computer relationships , 2005, TCHI.

[19]  Marilyn A. Walker,et al.  Quantitative and Qualitative Evaluation of Darpa Communicator Spoken Dialogue Systems , 2001, ACL.

[20]  Masahiro Shibata,et al.  Dialog System for Open-Ended Conversation Using Web Documents , 2009, Informatica.

[21]  Tatsuya Kawahara,et al.  Spoken Dialogue System based on Information Extraction using Similarity of Predicate Argument Structures , 2011, SIGDIAL Conference.

[22]  Joseph Weizenbaum,et al.  ELIZA—a computer program for the study of natural language communication between man and machine , 1966, CACM.