Towards reliable multimodal sensing in aware environments

A prototype system for implementing a reliable sensor network for large scale smart environments is presented. Most applications within any form of smart environments (rooms, offices, homes, etc.) are dependent on reliable who, where, when, and what information of its inhabitants (users). This information can be inferred from different sensors spread throughout the space. However, isolated sensing technologies provide limited information under the varying, dynamic, and long-term scenarios (24/7), that are inherent in applications for intelligent environments. In this paper, we present a prototype system that provides an infrastructure for leveraging the strengths of different sensors and processes used for the interpretation of their collective data. We describe the needs of such systems, propose an architecture to dealwith such multi-modal fusion, and discuss the initial set of sensors and processes used to address such needs.

[1]  Roberto Cipolla,et al.  Feature-based human face detection , 1997, Image Vis. Comput..

[2]  Elizabeth D. Mynatt,et al.  Digital family portraits: supporting peace of mind for extended family members , 2001, CHI.

[3]  James M. Rehg,et al.  Space-time memory: a parallel programming abstraction for interactive multimedia applications , 1999, PPoPP '99.

[4]  Yiteng Huang,et al.  Real-time acoustic source localization with passive microphone arrays , 2001 .

[5]  James L. Flanagan,et al.  A DSP implementation of source location using microphone arrays. , 1996 .

[6]  Michael G. Kay,et al.  Multimedia sensor fusion for intelligent camera control , 1996, 1996 IEEE/SICE/RSJ International Conference on Multisensor Fusion and Integration for Intelligent Systems (Cat. No.96TH8242).

[7]  I.A. Essa,et al.  Ubiquitous sensing for smart and aware environments , 2000, IEEE Wirel. Commun..

[8]  Hugh F. Durrant-Whyte,et al.  Sensor Models and Multisensor Integration , 1988, Int. J. Robotics Res..

[9]  Irfan Essa,et al.  Tracking Multiple People with Multiple Cameras , 1998 .

[10]  J. van Leeuwen,et al.  Audio- and Video-Based Biometric Person Authentication , 2001, Lecture Notes in Computer Science.

[11]  James L. Crowley,et al.  Multi-modal tracking of faces for video communications , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[12]  Gregory D. Abowd,et al.  The Family Intercom: Developing a Context-Aware Audio Communication System , 2001, UbiComp.

[13]  Barry Brumitt,et al.  EasyLiving: Technologies for Intelligent Environments , 2000, HUC.

[14]  Larry S. Davis,et al.  Active speech source localization by a dual coarse-to-fine search , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[15]  Gregory D. Abowd,et al.  A Conceptual Framework and a Toolkit for Supporting the Rapid Prototyping of Context-Aware Applications , 2001, Hum. Comput. Interact..

[16]  Yves Demazeau,et al.  Principles and techniques for sensor data fusion , 1993, Signal Process..

[17]  A. Blake,et al.  Sequential Monte Carlo fusion of sound and vision for speaker tracking , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[18]  James W. Davis,et al.  The KidsRoom: A Perceptually-Based Interactive and Immersive Story Environment , 1999, Presence.

[19]  James L. Crowley,et al.  Integration and control of reactive visual processes , 1994, Robotics Auton. Syst..

[20]  Gary Bradski,et al.  Computer Vision Face Tracking For Use in a Perceptual User Interface , 1998 .

[21]  Alberto Elfes,et al.  Using occupancy grids for mobile robot perception and navigation , 1989, Computer.