Temporal alignment using the incremental unit framework

We propose a method for temporal alignment--a precondition of meaningful fusion--of multimodal systems, using the incremental unit dialogue system framework, which gives the system flexibility in how it handles alignment: either by delaying a modality for a specified amount of time, or by revoking (i.e., backtracking) processed information so multiple information sources can be processed jointly. We evaluate our approach in an offline experiment with multimodal data and find that using the incremental framework is flexible and shows promise as a solution to the problem of temporal alignment in multimodal systems.

[1]  James F. Allen Maintaining knowledge about temporal intervals , 1983, CACM.

[2]  Petra Wagner,et al.  Micro-structure of disfluencies: basics for conversational speech synthesis , 2015, INTERSPEECH.

[3]  Mattias Heldner,et al.  Towards human-like spoken dialogue systems , 2008, Speech Commun..

[4]  Julie C. Sedivy,et al.  Eye movements and spoken language comprehension: Effects of visual context on syntactic ambiguity resolution , 2002, Cognitive Psychology.

[5]  Julie C. Sedivy,et al.  Subject Terms: Linguistics Language Eyes & eyesight Cognition & reasoning , 1995 .

[6]  Brian Sallans,et al.  Semantic Based Information Fusion in a Multimodal Interface , 2005, CSREA HCI.

[7]  Sharon L. Oviatt,et al.  Ten myths of multimodal interaction , 1999, Commun. ACM.

[8]  Gabriel Skantze,et al.  A General, Abstract Model of Incremental Dialogue Processing , 2011 .

[9]  Stefan Kopp,et al.  Situationally Aware In-Car Information Presentation Using Incremental Speech Generation: Safer, and More Effective , 2014, DM@EACL.

[10]  David Schlangen,et al.  InproTKs: A Toolkit for Incremental Situated Processing , 2014, SIGDIAL Conference.

[11]  Erik Cambria,et al.  A review of affective computing: From unimodal analysis to multimodal fusion , 2017, Inf. Fusion.

[12]  Lars Asplund,et al.  A general framework for incremental processing of multimodal inputs , 2011, ICMI '11.

[13]  J. Jacko,et al.  The human-computer interaction handbook: fundamentals, evolving technologies and emerging applications , 2002 .

[14]  Romain Laroche,et al.  NASTIA: Negotiating Appointment Setting Interface , 2014, LREC.

[15]  Hatice Gunes,et al.  Affect recognition from face and body: early fusion vs. late fusion , 2005, 2005 IEEE International Conference on Systems, Man and Cybernetics.

[16]  David Schlangen,et al.  The InproTK 2012 release , 2012, SDCTD@NAACL-HLT.

[17]  Julian Hough,et al.  It's Not What You Do, It's How You Do It: Grounding Uncertainty for a Simple Robot , 2017, 2017 12th ACM/IEEE International Conference on Human-Robot Interaction (HRI.

[18]  David Schlangen,et al.  Supporting Spoken Assistant Systems with a Graphical User Interface that Signals Incremental Understanding and Prediction State , 2016, SIGDIAL Conference.

[19]  Gabriel Skantze,et al.  Incremental Dialogue Processing in a Micro-Domain , 2009, EACL.

[20]  Michael Johnston,et al.  Interact: Tightly-coupling Multimodal Dialog with an Interactive Virtual Assistant , 2015, ICMI.

[21]  Ellen Campana,et al.  Incremental understanding in human-computer dialogue and experimental evidence for advantages over nonincremental methods , 2007 .

[22]  David Schlangen,et al.  Incremental Reference Resolution: The Task, Metrics for Evaluation, and a Bayesian Filtering Model that is Sensitive to Disfluencies , 2009, SIGDIAL Conference.

[23]  Alexander H. Waibel,et al.  Multimodal interfaces , 1996, Artificial Intelligence Review.

[24]  Gabriel Skantze,et al.  Towards Incremental Speech Generation in Dialogue Systems , 2010, SIGDIAL Conference.

[25]  David Schlangen,et al.  Interpreting Situated Dialogue Utterances: an Update Model that Uses Speech, Gaze, and Gesture Information , 2013, SIGDIAL Conference.

[26]  Ellen Campana,et al.  Software architectures for incremental understanding of human speech , 2006, INTERSPEECH.

[27]  Takenobu Tokunaga,et al.  The REX corpora: A collection of multimodal corpora of referring expressions in collaborative problem solving dialogues , 2012, LREC.

[28]  Mohan S. Kankanhalli,et al.  Multimodal fusion for multimedia analysis: a survey , 2010, Multimedia Systems.

[29]  Sharon L. Oviatt,et al.  Unification-based Multimodal Integration , 1997, ACL.

[30]  Alex Waibel,et al.  Modeling and Interpreting Multimodal Inputs: A Semantic Integration Approach , 1997 .

[31]  Timo Baumann,et al.  Incremental spoken dialogue processing: architecture and lower-level components , 2013 .