Extraction of pragmatic and semantic salience from spontaneous spoken English

This paper computationalizes two linguistic concepts, contrast and focus, for the extraction of pragmatic and semantic salience from spontaneous speech. Contrast and focus have been widely investigated in modern linguistics, as categories that link intonation and information/discourse structure. This paper demonstrates the automatic tagging of contrast and focus for the purpose of robust spontaneous speech understanding in a tutorial dialogue system. In particular, we propose two new transcription tasks, and demonstrate automatic replication of human labels in both tasks. First, we define focus kernel to represent those words that contain novel information neither presupposed by the interlocutor nor contained in the precedent words of the utterance. We propose detecting the focus kernel based on a word dissimilarity measure, part-of-speech tagging, and prosodic measurements including duration, pitch, energy, and our proposed spectral balance cepstral coefficients. In order to measure the word dissimilarity, we test a linear combination of ontological and statistical dissimilarity measures previously published in the computational linguistics literature. Second, we propose identifying symmetric contrast, which consists of a set of words that are parallel or symmetric in linguistic structure but distinct or contrastive in meaning. The symmetric contrast identification is performed in a way similar to the focus kernel detection. The effectiveness of the proposed extraction of symmetric contrast and focus kernel has been tested on a Wizard-of-Oz corpus collected in the tutoring dialogue scenario. The corpus consists of 630 non-single word/phrase utterances, containing approximately 5700 words and 48 minutes of speech. The tests used speech waveforms together with manual orthographic transcriptions, and yielded an accuracy of 83.8% for focus kernel detection and 92.8% for symmetric contrast detection. Our tests also demonstrated that the spectral balance cepstral coefficients, the semantic dissimilarity measure, and part-of-speech played important roles in the symmetric contrast and focus kernel detections.

[1]  Alexander H. Waibel,et al.  DIASUMM: Flexible Summarization of Spontaneous Dialogues in Unrestricted Domains , 2000, COLING.

[2]  Jennifer Cole,et al.  Speaker-Independent Automatic Detection of Pitch Accent , 2004 .

[3]  Stephen E. Levinson,et al.  Spoken language understanding in an intelligent tutoring scenario , 2004 .

[4]  M. Zubizarreta Prosody, Focus, and Word Order , 1998 .

[5]  D. Bolinger Contrastive Accent and Contrastive Stress , 1961 .

[6]  E F Furstman,et al.  In Focus , 2016, Journal - Southern California Dental Association.

[7]  Louise McNally,et al.  The Limits of Syntax , 1998 .

[8]  Yorick Wilks,et al.  Theoretical Issues in Natural Language Processing , 2018 .

[9]  Myoung-Ho Kim,et al.  Information Retrieval Based on Conceptual Distance in is-a Hierarchies , 1993, J. Documentation.

[10]  Mark Hasegawa-Johnson,et al.  Intertranscriber reliability of prosodic labeling on telephone conversation using toBI , 2004, INTERSPEECH.

[11]  M. Halliday NOTES ON TRANSITIVITY AND THEME IN ENGLISH. PART 2 , 1967 .

[12]  Patrick Pantel,et al.  Discovering word senses from text , 2002, KDD.

[13]  Dan Roth,et al.  A Learning Approach to Shallow Parsing , 1999, EMNLP.

[14]  Graeme Hirst,et al.  Near-Synonymy and Lexical Choice , 2002, CL.

[15]  Manfred Krifka,et al.  Additive Particles under Stress , 1998 .

[16]  Nancy Hedberg,et al.  The Prosody of Topic and Focus in Spontaneous English Dialogue , 2008 .

[17]  Mark Hasegawa-Johnson,et al.  Automatic recognition of pitch movements using multilayer perceptron and time-Delay Recursive neural network , 2004, IEEE Signal Processing Letters.

[18]  Mats Rooth A theory of focus interpretation , 1992, Natural Language Semantics.

[19]  A. B.,et al.  SPEECH COMMUNICATION , 2001 .

[20]  Stephen E. Levinson,et al.  Children's emotion recognition in an intelligent tutoring scenario , 2004, INTERSPEECH.

[21]  V. V. van Heuven,et al.  Spectral balance as a cue in the perception of linguistic stress. , 1997, The Journal of the Acoustical Society of America.

[22]  Ray Jackendoff,et al.  Semantic Interpretation in Generative Grammar , 1972 .

[23]  Alessandro Lenci Building an Ontology for the Lexicon: Semantic Types and Word Meaning , 2001 .

[24]  Treebank Penn,et al.  Linguistic Data Consortium , 1999 .

[25]  Ingrid Daubechies,et al.  The wavelet transform, time-frequency localization and signal analysis , 1990, IEEE Trans. Inf. Theory.

[26]  Jeanette K. Gundel,et al.  Topic and Focus , 2008 .

[27]  Carla Umbach,et al.  On the Notion of Contrast in Information Structure and Discourse Structure , 2004, J. Semant..

[28]  Ken Turner,et al.  The semantics/pragmatics interface from different points of view , 1999 .

[29]  Sung-Suk Kim Time-delay recurrent neural network for temporal correlations and prediction , 1998, Neurocomputing.

[30]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[31]  Daniel A. Keim,et al.  On Knowledge Discovery and Data Mining , 1997 .

[32]  Yi Xu,et al.  On the Temporal Domain of Focus , 2004 .

[33]  Mattias Heldner,et al.  A focus detector using overall intensity and high frequency emphasis , 1999 .

[34]  Roy Rada,et al.  Development and application of a metric on semantic nets , 1989, IEEE Trans. Syst. Man Cybern..

[35]  Philip R. Cohen,et al.  Intentions in Communication. , 1992 .

[36]  Beatrice Santorini Part-of-speech tagging guidelines for the penn treebank project , 1990 .

[37]  Giuseppe Riccardi,et al.  Automated Natural Spoken Dialog , 2002, Computer.

[38]  Bob Carpenter,et al.  Vector-based Natural Language Call Routing , 1999, Comput. Linguistics.

[39]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[40]  Ido Dagan,et al.  Contextual word similarity and estimation from sparse data , 1995, Comput. Speech Lang..

[41]  Charles L. A. Clarke,et al.  Frequency Estimates for Statistical Word Similarity Measures , 2003, NAACL.

[42]  Martin Kay Syntactic processing and functional sentence perspective , 1975, TINLAP '75.

[43]  Ellen Riloff,et al.  A Bootstrapping Method for Learning Semantic Lexicons using Extraction Pattern Contexts , 2002, EMNLP.

[44]  Gregory Ward,et al.  Discourse and Information Structure , 2005 .

[45]  Beatrice Santorini,et al.  Part-of-Speech Tagging Guidelines for the Penn Treebank Project (3rd Revision) , 1990 .

[46]  Edward Nelson,et al.  Syntax and Semantics , 1974 .

[47]  Pauline Welby,et al.  Effects of Pitch Accent Position, Type, and Status on Focus Projection , 2003, Language and speech.

[48]  R. Sandt,et al.  Focus: Linguistic, Cognitive, and Computational Perspectives , 1999 .

[49]  Giovanni Flammia,et al.  Discourse segmentation of spoken dialogue: an empirical approach , 1998 .

[50]  Michael Sussna,et al.  Word sense disambiguation for free-text indexing using a massive semantic network , 1993, CIKM '93.

[51]  Noam Chomsky,et al.  Deep structure, surface structure, and semantic interpretation , 1969 .

[52]  Derrick Higgins Which Statistics Reflect Semantics? Rethinking Synonymy and Word Similarity , 2005 .

[53]  Mark Steedman,et al.  Information Structure and the Syntax-Phonology Interface , 2000, Linguistic Inquiry.

[54]  Carlos Gussenhoven,et al.  Intonation and interpretation: phonetics and phonology , 2002, Speech Prosody 2002.

[55]  Dekang Lin,et al.  Automatic Retrieval and Clustering of Similar Words , 1998, ACL.