Automatic Induction of Dialogue Structure from the Companions Dialogue Corpus

We present a mechanism to learn repeated substructures of dialogue from an appropriately annotated dialogue corpus. We hypothesised that we could automatically segment a dialogue corpus into ‘chunks’ corresponding roughly to ‘game’-like structures. In our dialogue system we manage dialogue by employing a set of hand-crafted networks (ATNs) which we call ‘Dialogue Action Forms’ (DAFs). DAFs deterministically guide the progress of a conversation with the user, and are individuated by conversational topic—when the conversation is moved onto a new topic (either by the system or the user), a new DAF is pushed onto the stack. We hoped to automatically induce these DAFs from a dialogue corpus. To do this, we needed to derive substantive DAFs, such as might be used to learn about someone’s family (names, ages, relationships to the user, etc.), as well as meta-conversational DAFs that manage the dialogue at the ‘exchange structure’ or ‘game’ level, and we needed to combine the two. We hoped to find generalisations that clustered discovered topic chunks into classes of functionally similar chunks. To achieve this, our method was to devise a text segmentation technique specifically suitable for dialogue, and to use this in conjunction with a tool for analysing dialogue structure in terms of dialogue act sequences.

[1]  R. Stephenson A and V , 1962, The British journal of ophthalmology.

[2]  E. Schegloff,et al.  A simplest systematics for the organization of turn-taking for conversation , 1974 .

[3]  J. Sinclair,et al.  Towards an Analysis of Discourse: The English Used by Teachers and Pupils , 1975 .

[4]  R. Power The organisation of purposeful dialogues , 1979 .

[5]  Graeme Hirst,et al.  Lexical Cohesion Computed by Thesaural relations as an indicator of the structure of text , 1991, CL.

[6]  Padhraic Smyth,et al.  Clustering Sequences with Hidden Markov Models , 1996, NIPS.

[7]  Gwyneth Doherty-Sneddon,et al.  The Reliability of a Dialogue Structure Coding Scheme , 1997, CL.

[8]  Mark G. Core,et al.  Coding Dialogs with the DAMSL Annotation Scheme , 1997 .

[9]  Marti A. Hearst Text Tiling: Segmenting Text into Multi-paragraph Subtopic Passages , 1997, CL.

[10]  Regina Barzilay,et al.  Using Lexical Chains for Text Summarization , 1997 .

[11]  Yiming Yang,et al.  Topic Detection and Tracking Pilot Study Final Report , 1998 .

[12]  R. Barilay,et al.  Using lexical chains for text summarization , 1999 .

[13]  Mark G. Core,et al.  The Report of The Third Workshop of the Discourse Resource Initiative, Chiba University and Kazusa Academia Hall , 1999 .

[14]  Siddheswar Ray,et al.  Determination of Number of Clusters in K-Means Clustering and Application in Colour Image Segmentation , 2000 .

[15]  Freddy Y. Y. Choi Advances in domain independent linear text segmentation , 2000, ANLP.

[16]  Stanley Peters,et al.  The WITAS multi-modal dialogue system I , 2001, INTERSPEECH.

[17]  Alan F. Smeaton,et al.  SeLeCT: a lexical cohesion based news story segmentation system , 2004, AI Commun..

[18]  Fatih Murat Porikli,et al.  Clustering Variable Length Sequences by Eigenvector Decomposition Using HMM , 2004, SSPR/SPR.

[19]  Yorick Wilks Artificial Companions , 2004, MLMI.

[20]  Yorick Wilks,et al.  Dialogue Act Classification Based on Intra-Utterance Features∗ , 2005 .

[21]  Graeme Hirst,et al.  Evaluating WordNet-based Measures of Lexical Semantic Relatedness , 2006, CL.

[22]  Pavel Berkhin,et al.  A Survey of Clustering Data Mining Techniques , 2006, Grouping Multidimensional Data.

[23]  Cai Yi-chao Survey of Clustering Algorithms in Data Mining , 2007 .

[24]  Ting Liu,et al.  Investigating the Portability of Corpus-Derived Cue Phrases for Dialogue Act Classification , 2008, COLING.

[25]  Yorick Wilks,et al.  Dialogue, Speech and Images: the Companions Project Data Set , 2008, LREC.

[26]  Michael Montagne,et al.  Dialogue , 2010, Substance use & misuse.