Multimodal Dialogue Systems with InproTKs and Venice

We present extensions of the incremental processing toolkit INPROTK which, together with our networking adaptors (Venice), make it possible to plug in sensors and to achieve situated, real-time, multimodal dialogue. We also describe a new module which enables the use in INPROTK of the Google Web Speech API, which offers speech recognition with a very large vocabulary and a wide choice of languages. We illustrate the use of these extensions with a real-time multimodal reference resolution demo, which we make freely available, together with the toolkit itself.

[1]  Alexander H. Waibel,et al.  Multimodal interfaces , 1996, Artificial Intelligence Review.

[2]  Gabriel Skantze,et al.  A General, Abstract Model of Incremental Dialogue Processing , 2009, EACL.

[3]  David Schlangen,et al.  The InproTK 2012 release , 2012, SDCTD@NAACL-HLT.

[4]  Mohan S. Kankanhalli,et al.  Multimodal fusion for multimedia analysis: a survey , 2010, Multimedia Systems.

[5]  Sharon L. Oviatt,et al.  Multimodal Interfaces: A Survey of Principles, Models and Frameworks , 2009, Human Machine Interaction.

[6]  David Schlangen,et al.  InproTKs: A Toolkit for Incremental Situated Processing , 2014, SIGDIAL Conference.

[7]  Sebastian Wrede,et al.  A middleware for collaborative research in experimental robotics , 2011, 2011 IEEE/SICE International Symposium on System Integration (SII).