How Good is Automatic Segmentation as a Multimodal Discourse Annotation Aid?

Collaborative problem solving (CPS) in teams is tightly coupled with the creation of shared meaning between participants in a situated, collaborative task. In this work, we assess the quality of different utterance segmentation techniques as an aid in annotating CPS. We (1) manually transcribe utterances in a dataset of triads collaboratively solving a problem involving dialogue and physical object manipulation, (2) annotate collaborative moves according to these gold-standard transcripts, and then (3) apply these annotations to utterances that have been automatically segmented using toolkits from Google and OpenAI's Whisper. We show that the oracle utterances have minimal correspondence to automatically segmented speech, and that automatically segmented speech using different segmentation methods is also inconsistent. We also show that annotating automatically segmented speech has distinct implications compared with annotating oracle utterances--since most annotation schemes are designed for oracle cases, when annotating automatically-segmented utterances, annotators must invoke other information to make arbitrary judgments which other annotators may not replicate. We conclude with a discussion of how future annotation specs can account for these needs.

[1]  Jong Wook Kim,et al.  Robust Speech Recognition via Large-Scale Weak Supervision , 2022, ICML.

[2]  Caitlin Mills,et al.  Automatically detecting task-unrelated thoughts during conversations using keystroke analysis , 2022, User Modeling and User-Adapted Interaction.

[3]  Sidney K. D’Mello,et al.  Multimodal modeling of collaborative problem-solving facets in triads , 2021, User Modeling and User-Adapted Interaction.

[4]  Carol Forsyth,et al.  Exploring social and cognitive dimensions of collaborative problem solving in an open online simulation-based task , 2020, Comput. Hum. Behav..

[5]  Angela Stewart,et al.  Towards a generalized competency model of collaborative problem solving , 2020, Comput. Educ..

[6]  Samuel Greiff,et al.  Advancing the Science of Collaborative Problem Solving , 2018, Psychological science in the public interest : a journal of the American Psychological Society.

[7]  Michael Riley,et al.  Semantic Lattice Processing in Contextual Automatic Speech Recognition for Google Assistant , 2018, INTERSPEECH.

[8]  Daniel Moreira,et al.  Getting the subtext without the text: Scalable multimodal sentiment classification from visual and acoustic modalities , 2018, ArXiv.

[9]  Anton Leuski,et al.  Which ASR should I choose for my dialogue system? , 2013, SIGDIAL Conference.

[10]  Pierre Dillenbourg,et al.  Sharing Solutions: Persistence and Grounding in Multimodal Collaborative Problem Solving , 2006 .

[11]  Nikhil Krishnaswamy,et al.  Automatic Detection of Collaborative States in Small Groups Using Multimodal Features , 2023, AIED.

[12]  Rosemary Luckin,et al.  The NISPI framework: Analysing collaborative problem-solving from students' physical interactions , 2018, Comput. Educ..

[13]  Andrew Olney,et al.  Semi-Automatic Detection of Teacher Questions from Human-Transcripts of Audio in Live Classrooms , 2016, EDM.

[14]  James Pustejovsky,et al.  Natural Language Annotation for Machine Learning - a Guide to Corpus-Building for Applications , 2012 .

[15]  Stephanie D. Teasley,et al.  The Construction of Shared Knowledge in Collaborative Problem Solving , 1995 .

[16]  Nikhil Krishnaswamy,et al.  A deep dive into microphone hardware for recording collaborative group work , 2022 .