Processing units in conversation: A comparative study of French and Mandarin data

Human spoken language production is directed towards communication delivering comprehensible information to recipients. Speech segmentation into small units efficiently enhances a sensible and interpretable discourse structure. Such processing units in real-life communication may be applied to semantic, syntactic, or prosodic structures. Previous studies have proposed various theories of speech segmentation, mainly based on qualitative analyses. The present study utilizes corpus-based quantitative data to examine how conversational speech in French and Mandarin is structured in terms of three different processing units, and how these units interact with one another. Unit completion location was identified by semantic structure (discourse unit), prosodic pattern (prosodic unit), and sequences of parts of speech (chunk). Quantitative analyses for both languages were carried out by applying comparable processing procedures. This article presents our efforts to establish a dataset for two typologically divers...

[1]  M. Selting On the Interplay of Syntax and Prosody in the Constitution of Turn-Constructional Units and Turns in Conversation , 1996 .

[2]  Roxane Bertrand,et al.  Permanence et variation des unités prosodiques dans le discours et l'interaction , 2011, Journal of French Language Studies.

[3]  Laurent Prévot,et al.  Segmentation evaluation metrics, a comparison grounded on prosodic and discourse units , 2014, LREC.

[4]  Brian Butterworth,et al.  Hesitation and semantic planning in speech , 1975 .

[5]  Hongyin Tao,et al.  Units in Mandarin Conversation: Prosody, discourse, and grammar , 1996 .

[6]  Remko Scha,et al.  A Syntactic Approach to Discourse Semantics , 1984, ACL.

[7]  Anne Lacheret,et al.  Intonosyntactic Data Structures: The Rhapsodie Treebank of Spoken French , 2012, LAW@ACL.

[8]  Anne Lacheret,et al.  Design and Evaluation of Shared Prosodic Annotation for Spontaneous French Speech: From Expert Knowledge to Non-Expert Annotation , 2010, Linguistic Annotation Workshop.

[9]  Philippe Blache,et al.  Creating and Exploiting Multimodal Annotated Corpora: The ToMA Project , 2009, Multimodal Corpora.

[10]  Manfred Stede,et al.  Discourse Processing , 2011, NAACL.

[11]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[12]  Maki Watanabe,et al.  Discourse Tagging Reference Manual , 2001 .

[13]  Candace L. Sidner,et al.  Attention, Intentions, and the Structure of Discourse , 1986, CL.

[14]  Marti A. Hearst,et al.  A Critique and Improvement of an Evaluation Metric for Text Segmentation , 2002, CL.

[15]  H. H. Clark,et al.  Repeating Words in Spontaneous Speech , 1998, Cognitive Psychology.

[16]  T. Givón,et al.  English grammar : a function-based introduction , 1995 .

[17]  Jean Carletta,et al.  Assessing Agreement on Classification Tasks: The Kappa Statistic , 1996, CL.

[18]  Philippe Blache,et al.  Robustness and processing difficulty models. A pilot study for eye-tracking data on the French Treebank , 2012 .

[19]  V. M. Holmes,et al.  Planning units and syntax in sentence production , 1978, Cognition.

[20]  Mariapaola D'Imperio,et al.  Accentual Phrase Boundaries and Lexical Access in French , 2010 .

[21]  Jerry R. Hobbs Why Is Discourse Coherent , 1978 .

[22]  Marti A. Hearst Multi-Paragraph Segmentation Expository Text , 1994, ACL.

[23]  Elisabeth Schriberg,et al.  Preliminaries to a Theory of Speech Disfluencies , 1994 .

[24]  Ron Artstein,et al.  Survey Article: Inter-Coder Agreement for Computational Linguistics , 2008, CL.

[25]  Brechtje Post,et al.  Tonal and phrasal structures in French intonation , 2000 .

[26]  Nianwen Xue,et al.  Labeling Chinese Predicates with Semantic Roles , 2008, CL.

[27]  Diana Inkpen,et al.  Segmentation Similarity and Agreement , 2012, NAACL.

[28]  A. Simon,et al.  L'analyse en unités discursives de base : pourquoi et comment ? , 2011 .

[29]  Philippe Blache,et al.  Influence de la qualité de l’étiquetage sur le chunking : une corrélation dépendant de la taille des chunks , 2008, JEPTALNRECITAL.

[30]  Jyh-Shing Roger Jang,et al.  Phone Boundary Annotation in Conversational Speech , 2014, LREC.

[31]  A HearstMarti,et al.  A critique and improvement of an evaluation metric for text segmentation , 2002 .

[32]  Margret Selting,et al.  The construction of units in conversational talk , 2000, Language in Society.

[33]  Jonathan Ginzburg,et al.  SHARDS: Fragment Resolution in Dialogue , 2008 .

[34]  Martin van den Berg,et al.  A Rule Based Approach to Discourse Parsing , 2004, SIGDIAL Workshop.

[35]  A. Meyer Conceptual Influences on Grammatical Planning Units , 1997 .

[36]  Shu-Chuan Tseng,et al.  Computational Linguistics & Chinese Language Processing Aims and Scope on the Use of Speech Recognition Techniques to Identify Bird a Novel Approach for Handling Unknown Word Problem in Chinese-vietnamese Machine Translation , 2022 .

[37]  Laurent Prevot,et al.  Manuel d'annotation en relations de discours du projet ANNODIS , 2010 .

[38]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[39]  Daniel Hirst,et al.  The effect of stress and boundaries on segmental duration in a corpus of authentic speech (british English) , 2005, INTERSPEECH.

[40]  Shu-Chuan Tseng Grammar, prosody and speech disfluencies in spoken dialogues , 1999 .

[41]  J. Dankovičcová The domain of articulation rate variation in Czech , 1997 .

[42]  Cecilia E. Ford,et al.  Interactional units in conversation: Syntactic, intonational, and pragmatic resources for the mana , 1996 .

[43]  Chu-Ren Huang,et al.  SINICA CORPUS : Design Methodology for Balanced Corpora , 1996, PACLIC.

[44]  Ludovic Tanguy,et al.  An empirical resource for discovering cognitive principles of discourse organisation: the ANNODIS corpus , 2012, LREC.

[45]  Robert C. Berwick,et al.  Principle-Based Parsing: Computation and Psycholinguistics , 1991 .

[46]  Anne-Catherine Simon,et al.  Minimal Discourse Units: Can we define them, and why should we? , 2005 .

[47]  Anne Lacheret,et al.  Modéliser l'interface intonosyntaxique , 2013 .

[48]  Mattias Heldner,et al.  Utterance segmentation and turn-taking in spoken dialogue systems , 2005 .

[49]  Linguistic Patterns Detected Through a Prosodic Segmentation in Spontaneous Taiwan Mandarin Speech , 2008 .

[50]  James Paul Gee,et al.  Performance structures: A psycholinguistic and linguistic appraisal , 1983, Cognitive Psychology.

[51]  Liesbeth Degand,et al.  On identifying basic discourse units in speech: theoretical and empirical issues , 2009 .

[52]  Cecilia E. Ford,et al.  Interaction and grammar: Interactional units in conversation: syntactic, intonational, and pragmatic resources for the management of turns , 1996 .

[53]  Steven Abney,et al.  Parsing By Chunks , 1991 .

[54]  W. Chafe Discourse, Consciousness, and Time: The Flow and Displacement of Conscious Experience in Speaking and Writing , 1996 .

[55]  Elisabeth Selkirk,et al.  Phonology and Syntax: The Relation between Sound and Structure , 1984 .

[56]  Sun-Ah Jun,et al.  A Phonological Model of French Intonation , 2000 .

[57]  Zeno Vendler,et al.  Verbs and Times , 1957, The Language of Time - A Reader.

[58]  R. Espesser,et al.  Le CID - Corpus of Interactional Data. Annotation et exploitation multimodale de parole conversationnelle [The “Corpus of Interactional Data” (CID) - Multimodal annotation of conversational speech”] , 2008, ICON.

[59]  David R. Traum,et al.  Utterance Units in Spoken Dialogue , 1996, ECAI Workshop on Dialogue Processing in Spoken Language Systems.

[60]  E. Schegloff,et al.  A simplest systematics for the organization of turn-taking for conversation , 2015 .

[61]  James F. Allen,et al.  The Trains 91 Dialogues , 1993 .

[62]  Philippe Blache Chunks et activation : un modèle de facilitation du traitement linguistique , 2013 .

[63]  Rebecca J. Passonneau,et al.  Discourse Segmentation by Human and Automated Means , 1997, CL.

[64]  Philippe Blache,et al.  Quantitative experiments on prosodic and discourse units in the Corpus of Interactional Data , 2011 .

[65]  Irina Illina,et al.  The automatic news transcription system: ANTS, some real time experiments , 2004, INTERSPEECH.