Towards automatic detection of reported speech in dialogue using prosodic cues

The phenomenon of reported speech – whereby we quote the words, thoughts and opinions of others, or recount past dialogue – is widespread in conversational speech. Detecting such quotations automatically has numerous applications: for example, in enhancing automatic transcription or spoken language understanding applications. However, the task is challenging, not least because lexical cues of quotations are frequently ambiguous or not present in spoken language. The aim of this paper is to identify potential prosodic cues of reported speech which could be used, along with the lexical ones, to automatically detect quotations and ascribe them to their rightful source, that is reconstructing their attribution relations. In order to do so we analyze SARC, a small corpus of telephone conversations that we have annotated with attribution relations. The results of the statistical analysis performed on the data show how variations in pitch, intensity, and timing features can be exploited as cues of quotations. Furthermore, we build a SVM classifier which integrates lexical and prosodic cues to automatically detect quotations in speech that performs significantly better than chance.

[1]  Miguel Oliveira,et al.  Prosody as marker of direct reported speech boundary , 2004, Speech Prosody 2004.

[2]  Michelle L. Gregory,et al.  Prosodic correlates of directly reported speech: Evidence from conversational speech , 2004 .

[3]  Jessie Sams Quoting the unspoken: An analysis of quotations in spoken discourse , 2010 .

[4]  Alan Lee,et al.  Attribution and its annotation in the Penn Discourse TreeBank , 2006, Trait. Autom. des Langues.

[5]  Klaus Zechner,et al.  The importance of optimal parameter setting for pitch extraction. , 2010 .

[6]  Ji-Hwan Kim,et al.  A combined punctuation generation and speech recognition system and its performance enhancement using prosody , 2003, Speech Commun..

[7]  Galina B. Bolden The quote and beyond: defining boundaries of reported speech in conversational Russian , 2004 .

[8]  James R. Curran,et al.  A Sequence Labelling Approach to Quote Attribution , 2012, EMNLP.

[9]  Elizabeth Couper-Kuhlen,et al.  QUOTE - UNQUOTE? : the role of prosody in the contextualization of reported speech sequences , 1999 .

[10]  Thomas Lengauer,et al.  ROCR: visualizing classifier performance in R , 2005, Bioinform..

[11]  Helena Moniz,et al.  Bilingual Experiments on Automatic Recovery of Capitalization and Punctuation of Automatic Speech Transcripts , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  Andreas Ritter,et al.  Reporting Talk Reported Speech In Interaction , 2016 .

[13]  Susanne Günthner Polyphony and the 'layering of voices' in reported dialogues : An analysis of the use of prosodic devices in everyday reported speech , 1999 .

[14]  M. Bakhtin,et al.  The Dialogic Imagination: Four Essays , 1981 .

[15]  Mark Liberman,et al.  Transcriber: a free tool for segmenting, labeling and transcribing speech , 1998, LREC.

[16]  D. Bates,et al.  Linear Mixed-Effects Models using 'Eigen' and S4 , 2015 .

[17]  Caryl Emerson,et al.  The Dialogic Imagination. Four Essays by M. M. Bakhtin , 1982 .

[18]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[19]  S. Romaine,et al.  The Use of like as a Marker of Reported Speech and Thought: A Case of Grammaticalization in Progress , 1991 .

[20]  Lori Lamel,et al.  Development and Evaluation of Automatic Punctuation for French and English Speech-to-Text , 2012, INTERSPEECH.

[21]  P. Bell,et al.  Detecting Attribution Relations in Speech: a Corpus Study , 2014 .

[22]  Andreas Stolcke,et al.  Automatic punctuation and disfluency detection in multi-party meetings using prosodic and lexical cues , 2002, INTERSPEECH.

[23]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[24]  Silvia Pareti,et al.  A Database of Attribution Relations , 2012, LREC.

[25]  E. Holt,et al.  Reporting on talk: the use of direct reported speech in conversation , 1996 .

[26]  Claude Barras,et al.  Transcribing with Annotation Graphs , 2000, LREC.