A hardware and software architecture to deal with multimodal and collaborative interactions in multiuser virtual reality environments

Most advanced immersive devices provide collaborative environment within several users have their distinct head-tracked stereoscopic point of view. Combining with common used interactive features such as voice and gesture recognition, 3D mouse, haptic feedback, and spatialized audio rendering, these environments should faithfully reproduce a real context. However, even if many studies have been carried out on multimodal systems, we are far to definitively solve the issue of multimodal fusion, which consists in merging multimodal events coming from users and devices, into interpretable commands performed by the application. Multimodality and collaboration was often studied separately, despite of the fact that these two aspects share interesting similarities. We discuss how we address this problem, thought the design and implementation of a supervisor that is able to deal with both multimodal fusion and collaborative aspects. The aim of this supervisor is to ensure the merge of user’s input from virtual reality devices in order to control immersive multi-user applications. We deal with this problem according to a practical point of view, because the main requirements of this supervisor was defined according to a industrial task proposed by our automotive partner, that as to be performed with multimodal and collaborative interactions in a co-located multi-user environment. In this task, two co-located workers of a virtual assembly chain has to cooperate to insert a seat into the bodywork of a car, using haptic devices to feel collision and to manipulate objects, combining speech recognition and two hands gesture recognition as multimodal instructions. Besides the architectural aspect of this supervisor, we described how we ensure the modularity of our solution that could apply on different virtual reality platforms, interactive contexts and virtual contents. A virtual context observer included in this supervisor in was especially designed to be independent to the content of the virtual scene of targeted application, and is use to report high-level interactive and collaborative events. This context observer allows the supervisor to merge these interactive and collaborative events, but is also used to deal with new issues coming from our observation of two co-located users in an immersive device performing this assembly task. We highlight the fact that when speech recognition features are provided to the two users, it is required to automatically detect according to the interactive context, whether the vocal instructions must be translated into commands that have to be performed by the machine, or whether they take a part of the natural communication necessary for collaboration. Information coming from this context observer that indicates a user is looking at its collaborator, is important to detect if the user is talking to its partner. Moreover, as the users are physically co-localised and head-tracking is used to provide high fidelity stereoscopic rendering, and natural walking navigation in the virtual scene, we have to deals with collision and screen occlusion between the co-located users in the physical work space. Working area and focus of each user, computed and reported by the context observer is necessary to prevent or avoid these situations.

[1]  Robert E. Kraut,et al.  Coordination of communication: effects of shared visual context on collaborative work , 2000, CSCW '00.

[2]  Roy A. Ruddle,et al.  Symmetric and asymmetric action integration during cooperative object manipulation in virtual environments , 2002, TCHI.

[3]  Mohammed Yeasin,et al.  A real-time framework for natural multimodal interaction with large screen displays , 2002, Proceedings. Fourth IEEE International Conference on Multimodal Interfaces.

[4]  Bernd Fröhlich,et al.  The two-user Responsive Workbench: support for collaboration through individual views of a shared space , 1997, SIGGRAPH.

[5]  Philippe A. Palanque,et al.  Formal description techniques to support the design, construction and evaluation of fusion engines for sure (safe, usable, reliable and evolvable) multimodal interfaces , 2009, ICMI-MLMI '09.

[6]  Jean Vanderdonckt,et al.  A fusion framework for multimodal interactive applications , 2009, ICMI-MLMI '09.

[7]  Philippe A. Palanque,et al.  Fusion engines for multimodal input: a survey , 2009, ICMI-MLMI '09.

[8]  Bernd Fröhlich,et al.  Collaborative Interaction in Co-Located Two-User Scenarios , 2009, EGVE/ICAT/EuroVR.

[9]  Carla Maria Dal Sasso Freitas,et al.  Cooperative object manipulation in immersive virtual environments: framework and techniques , 2002, VRST '02.

[10]  Ivan Marsic,et al.  A framework for rapid development of multimodal interfaces , 2003, ICMI '03.

[11]  Jean-Claude Martin,et al.  TYCOON: Theoretical Framework and Software Tools for Multimodal Interfaces , 1997 .

[12]  Steve Benford,et al.  Supporting Cooperative Work in Virtual Environments , 1994, Comput. J..

[13]  A BoltRichard,et al.  Put-that-there , 1980 .

[14]  Steven K. Feiner,et al.  Mutual disambiguation of 3D multimodal interaction in augmented and virtual reality , 2003, ICMI '03.

[15]  Sharon L. Oviatt,et al.  Designing the User Interface for Multimodal Speech and Pen-Based Gesture Applications: State-of-the-Art Systems and Future Research Directions , 2000, Hum. Comput. Interact..

[16]  Mel Slater,et al.  Leadership and collaboration in shared virtual environments , 1999, Proceedings IEEE Virtual Reality (Cat. No. 99CB36316).

[17]  Marc Erich Latoschik,et al.  Resolving object references in multimodal dialogues for immersive virtual environments , 2004, IEEE Virtual Reality 2004.

[18]  Ilona Heldal,et al.  Collaborating in networked immersive spaces: as good as being there together? , 2001, Comput. Graph..

[19]  Judith S. Olson,et al.  A room of your own: what would it take to help remote groups work as well as collocated groups? , 1998, CHI Conference Summary.

[20]  Sharon L. Oviatt,et al.  Multimodal Interfaces: A Survey of Principles, Models and Frameworks , 2009, Human Machine Interaction.

[21]  Bruno Arnaldi,et al.  A General Framework for Cooperative Manipulation in Virtual Environments , 1999, EGVE.

[22]  Bernd Fröhlich,et al.  Implementing Multi-Viewer Stereo Displays , 2005, WSCG.

[23]  Xiaoyu Zhang,et al.  Quantifying the benefits of immersion for collaboration in virtual environments , 2005, VRST '05.

[24]  Fang Chen,et al.  A novel method for multi-sensory data fusion in multimodal human computer interaction , 2006, OZCHI.

[25]  Stephanie D. Teasley,et al.  How does radical collocation help a team succeed? , 2000, CSCW '00.

[26]  Denis Lalanne,et al.  Benchmarking fusion engines of multimodal interactive systems , 2009, ICMI-MLMI '09.

[27]  Wolfgang Broll Interacting in distributed collaborative virtual environments , 1995, Proceedings Virtual Reality Annual International Symposium '95.

[28]  Yacine Bellik,et al.  Media integration in multimodal interfaces , 1997, Proceedings of First Signal Processing Society Workshop on Multimedia Signal Processing.

[29]  Marc Erich Latoschik A user interface framework for multimodal VR interactions , 2005, ICMI '05.