Multimodal Integration A Biological View

We present a novel methodology for building highly integrated multimodal systems. Our approach is motivated by neurological and behavioral theories of sensory perception in humans and animals. We argue that perceptual integration in multimodal systems needs to happen at all levels of the individual perceptual processes. Rather than treating each modality as a separately processed, increasingly abstracted pipeline – in which integration over abstract sensory representations occurs as the final step – we claim that integration and the sharing of perceptual information must also occur at the earliest stages of sensory processing. This paper presents our methodology for constructing multimodal systems and examines its theoretic motivation. We have followed this approach in creating the most recent version of a highly interactive environment called the Intelligent Room and we argue that doing so has provided the Intelligent Room with unique perceptual capabilities and gives insight into building similar complex multimodal systems.

[1]  Michael H. Coen,et al.  Meeting the Computational Needs of Intelligent Environments: The Metaglue System , 2000 .

[2]  Jack Mostow,et al.  Authoring New Material in a Reading Tutor that Listens , 1999, AAAI/IAAI.

[3]  G. Butterworth The Origins of Auditory-Visual Perception and Visual Proprioception in Human Development , 1981 .

[4]  Sharon L. Oviatt,et al.  Multimodal Integration - A Statistical View , 1999, IEEE Trans. Multim..

[5]  Amnon Rapoport,et al.  Handbook of Mathematical Psychology, Volume III. , 1967 .

[6]  D. Spalding The Principles of Psychology , 1873, Nature.

[7]  T. Sejnowski,et al.  A critique of pure vision , 1993 .

[8]  Vibhu O. Mittal,et al.  Assistive Technology and Artificial Intelligence: Applications in Robotics, User Interfaces and Natural Language Processing , 1998 .

[9]  Rosalind W. Picard Affective Computing , 1997 .

[10]  A. Meltzoff,et al.  Imitation of Facial and Manual Gestures by Human Neonates , 1977, Science.

[11]  R. Hari,et al.  Seeing speech: visual information from lip movements modifies activity in the human auditory cortex , 1991, Neuroscience Letters.

[12]  W. H. Sumby,et al.  Visual contribution to speech intelligibility in noise , 1954 .

[13]  Michael H. Coen The future of human-computer interaction or how i learned to stop worrying and love my intelligent r , 1999 .

[14]  J. Andreassi,et al.  Effects of bisensory stimulation on reaction time and the evoked cortical potential , 1975 .

[15]  J. Piaget The construction of reality in the child , 1954 .

[16]  Michael H. Coen,et al.  Learning spatial event models from multiple-camera perspectives , 1999, IECON'99. Conference Proceedings. 25th Annual Conference of the IEEE Industrial Electronics Society (Cat. No.99CH37029).

[17]  Michael H. Coen,et al.  Design Principles for Intelligent Environments , 1998, AAAI/IAAI.

[18]  Brian Scassellati,et al.  Infant-like Social Interactions between a Robot and a Human Caregiver , 2000, Adapt. Behav..

[19]  William M. Jenkins,et al.  Neural Ontogeny of Higher Brain Function: Implications of Some Recent Neurophysiological Findings , 1991 .

[20]  Joshua G. Hale,et al.  Using Humanoid Robots to Study Human Behavior , 2000, IEEE Intell. Syst..

[21]  Matthew Rizzo,et al.  Synesthesia , 1989, Neurology.

[22]  R. Held Shifts in binaural localization after prolonged exposures to atypical combinations of stimuli. , 1955, The American journal of psychology.

[23]  Alex Pentland,et al.  Pfinder: Real-Time Tracking of the Human Body , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  Richard A. Bolt,et al.  “Put-that-there”: Voice and gesture at the graphics interface , 1980, SIGGRAPH '80.

[25]  Victor Zue,et al.  JUPlTER: a telephone-based conversational interface for weather information , 2000, IEEE Trans. Speech Audio Process..

[26]  David G. Stork,et al.  Speechreading: an overview of image processing, feature extraction, sensory integration and pattern recognition techniques , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[27]  Michael H. Coen Building Brains for Rooms: Designing Distributed Software Agents , 1997, AAAI/IAAI.

[28]  Alexander H. Waibel,et al.  Face recognition in a meeting room , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[29]  Joel L. Davis,et al.  Large-Scale Neuronal Theories of the Brain , 1994 .

[30]  G. A. Miller,et al.  Finitary models of language users , 1963 .

[31]  Liang Chen,et al.  QuickSet: Multimodal Interaction for Simulation Set-up and Control , 1997, ANLP.

[32]  Ove Franzén,et al.  Information Processing in the Somatosensory System , 1991 .

[33]  Brian Scassellati,et al.  Alternative Essences of Intelligence , 1998, AAAI/IAAI.

[34]  D. H. Warren,et al.  The role of visual-auditory “compellingness” in the ventriloquism effect: Implications for transitivity among the spatial senses , 1981, Perception & psychophysics.

[35]  H. von Helmholtz,et al.  Helmholtz's treatise on physiological optics, Vol. 1, Trans. from the 3rd German ed. , 1924 .

[36]  Gordon Cheng,et al.  Complex continuous meaningful humanoid interaction: a multi sensory-cue based approach , 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065).

[37]  Mendelson Mj,et al.  The relation between audition and vision in the human newborn. , 1976 .

[38]  S. Ullman High-Level Vision: Object Recognition and Visual Cognition , 1996 .

[39]  H. McGurk,et al.  Hearing lips and seeing voices , 1976, Nature.

[40]  M. Mendelson,et al.  The relation between audition and vision in the human newborn. , 1976, Monographs of the Society for Research in Child Development.

[41]  D. Lewkowicz,et al.  A dynamic systems approach to the development of cognition and action. , 2007, Journal of cognitive neuroscience.

[42]  Giulio Sandini,et al.  Human Sensori-Motor Development and Artificial Systems , 1997 .

[43]  Claudio S. Pinhanez,et al.  Design Decisions for Interactive Environments: Evaluating the KidsRoom , 1998 .

[44]  I. Kohler,et al.  The formation and transformation of the perceptual world. , 1963 .

[45]  David G. Stork,et al.  Invited Speech: Speechreading: An Overview of Image Processing, Feature Extraction, Sensory Intergration and Pattern Recognition Techiques , 1996 .