Extending multimedia languages to support multimodal user interactions

Historically, the Multimedia community research has focused on output modalities, through studies on timing and multimedia processing. The Multimodal Interaction community, on the other hand, has focused on user-generated modalities, through studies on Multimodal User Interfaces (MUI). In this paper, aiming to assist the development of multimedia applications with MUIs, we propose the integration of concepts from those two communities in a unique high-level programming framework. The framework integrates user modalities —both user-generated (e.g., speech, gestures) and user-consumed (e.g., audiovisual, haptic)— in declarative programming languages for the specification of interactive multimedia applications. To illustrate our approach, we instantiate the framework in the NCL (Nested Context Language) multimedia language. NCL is the declarative language for developing interactive applications for Brazilian Digital TV and an ITU-T Recommendation for IPTV services. To help evaluate our approach, we discuss a usage scenario and implement it as an NCL application extended with the proposed multimodal features. Also, we compare the expressiveness of the multimodal NCL against existing multimedia and multimodal languages, for both input and output modalities.

[1]  Stefan Kopp,et al.  Towards a Common Framework for Multimodal Generation: The Behavior Markup Language , 2006, IVA.

[2]  Marek R. Ogiela,et al.  Rule-based approach to recognizing human body poses and gestures in real time , 2013, Multimedia Systems.

[3]  Hendrik T. Macedo,et al.  Architectures for interactive vocal environment to Brazilian digital TV middleware , 2008, EATIS.

[4]  D. Bulterman,et al.  SMIL 3.0: Flexible Multimedia for Web, Mobile Devices and Daisy Talking Books , 2004 .

[5]  Nils Klarlund,et al.  Towards SMIL as a foundation for multimodal, multimedia applications , 2001, INTERSPEECH.

[6]  Matthew Turk,et al.  Multimodal interaction: A review , 2014, Pattern Recognit. Lett..

[7]  Denis Lalanne,et al.  Description languages for multimodal interaction: a set of guidelines and its illustration with SMUIML , 2010, Journal on Multimodal User Interfaces.

[8]  Chung-Ming Huang,et al.  Synchronization for Interactive Multimedia Presentations , 1998, IEEE Multim..

[9]  Christian Timmerer,et al.  A Generic Utility Model Representing the Quality of Sensory Experience , 2014, TOMM.

[10]  Márcio Ferreira Moreno,et al.  Ginga-NCL: Declarative middleware for multimedia IPTV services , 2010, IEEE Communications Magazine.

[11]  Alexander H. Waibel,et al.  Multimodal interfaces , 1996, Artificial Intelligence Review.

[12]  Ann Blandford,et al.  Four easy pieces for assessing the usability of multimodal interaction: the CARE properties , 1995, INTERACT.

[13]  Max Mühlhäuser,et al.  JVoiceXML as a modality component in the W3C multimodal architecture , 2013, Journal on Multimodal User Interfaces.

[14]  Harald Kosch,et al.  Interactive non-linear video: definition and XML structure , 2012, DocEng '12.

[15]  Harry Hochheiser,et al.  Research Methods for Human-Computer Interaction , 2008 .

[16]  Weisi Lin,et al.  Mulsemedia: State of the Art, Perspectives, and Challenges , 2014, TOMM.

[17]  Denis Lalanne,et al.  HephaisTK: a toolkit for rapid prototyping of multimodal interfaces , 2009, ICMI-MLMI '09.

[18]  Sharon L. Oviatt,et al.  Multimodal Interfaces: A Survey of Principles, Models and Frameworks , 2009, Human Machine Interaction.

[19]  James F. Allen Maintaining knowledge about temporal intervals , 1983, CACM.

[20]  Dick C. A. Bulterman,et al.  Structured multimedia authoring , 2005, TOMCCAP.

[21]  Thomas K. Landauer,et al.  Research Methods in Human-Computer Interaction , 1988 .

[22]  Lawrence A. Rowe Looking forward 10 years to multimedia successes , 2013, TOMCCAP.

[23]  Dave Burke Voice Extensible Markup Language (VoiceXML) , 2007 .

[24]  Satoshi Kobayashi,et al.  XISL: A Modality-Independent MMI Description Language , 2005 .

[25]  Ben Shneiderman,et al.  Designing the User Interface: Strategies for Effective Human-Computer Interaction , 1998 .

[26]  Carlos Duarte,et al.  Adapting Multimodal Fission to User's Abilities , 2011, HCI.

[27]  Kuansan Wang SALT: a spoken language interface for web-based multimodal dialog systems , 2002, INTERSPEECH.

[28]  Richard A. Bolt,et al.  “Put-that-there”: Voice and gesture at the graphics interface , 1980, SIGGRAPH '80.

[29]  Kuansan Wang,et al.  SALT: An XML Application for Web-based Multimodal Dialog Management , 2002, NLPXML@COLING.