A Machine Learning Approach to Speech Act Classification Using Function Words

This paper presents a novel technique for the classification of sentences as Dialogue Acts, based on structural information contained in function words. It focuses on classifying questions or non-questions as a generally useful task in agent-based systems. The proposed technique extracts salient features by replacing function words with numeric tokens and replacing each content word with a standard numeric wildcard token. The Decision Tree, which is a well-established classification technique, has been chosen for this work. Experiments provide evidence of potential for highly effective classification, with a significant achievement on a challenging dataset, before any optimisation of feature extraction has taken place.

[1]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[2]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[3]  Anton Nijholt,et al.  Dialogue Act Recognition with Bayesian Networks for Dutch Dialogues , 2002, SIGDIAL Workshop.

[4]  Terrence Fong,et al.  Multi-robot remote driving with collaborative control , 2003, IEEE Trans. Ind. Electron..

[5]  Igor Aleksander,et al.  Introduction to Neural Computing , 1990 .

[6]  Barbara Di Eugenio,et al.  Latent Semantic Analysis for Dialogue Act Classification , 2003, NAACL.

[7]  Diana Inkpen,et al.  Semantic text similarity using corpus-based word similarity and string similarity , 2008, ACM Trans. Knowl. Discov. Data.

[8]  Teuvo Kohonen,et al.  An introduction to neural computing , 1988, Neural Networks.

[9]  Yorick Wilks,et al.  Dialogue Act Classification Based on Intra-Utterance Features∗ , 2005 .

[10]  Timothy Bickmore,et al.  Health dialog systems for patients and consumers , 2006, J. Biomed. Informatics.

[11]  C. J. van Rijsbergen,et al.  Information Retrieval , 1979, Encyclopedia of GIS.

[12]  Ian H. Witten,et al.  Data Mining: Practical Machine Learning Tools and Techniques, 3/E , 2014 .

[13]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[14]  C. Lee Giles,et al.  CiteSeer: an autonomous Web agent for automatic retrieval and identification of interesting publications , 1998, AGENTS '98.

[15]  Tharam S. Dillon,et al.  Tree model guided candidate generation for mining frequent subtrees from XML documents , 2008, TKDD.

[16]  Zuhair Bandar,et al.  Sentence similarity based on semantic nets and corpus statistics , 2006, IEEE Transactions on Knowledge and Data Engineering.

[17]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[18]  Dirk Heylen,et al.  DIALOGUE-ACT TAGGING USING SMART FEATURE SELECTION; RESULTS ON MULTIPLE CORPORA , 2006, 2006 IEEE Spoken Language Technology Workshop.

[19]  Ian Witten,et al.  Data Mining , 2000 .

[20]  Zuhair Bandar,et al.  A Method for Measuring Sentence Similarity and iIts Application to Conversational Agents , 2004, FLAIRS.

[21]  Andreas Stolcke,et al.  AUTOMATIC DIALOG ACT LABELING WITH MINIMAL SUPERVISION , 2008 .

[22]  Wolfgang Wahlster,et al.  KANTRA - A Natural Language Interface for Intelligent Robots , 2003 .

[23]  Karen Sparck Jones A statistical interpretation of term specificity and its application in retrieval , 1972 .

[24]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[25]  Simon Keizer,et al.  A Bayesian Approach to Dialogue Act Classication , 2001 .

[26]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[27]  Stefan Wermter,et al.  Learning dialog act processing , 1996, COLING.