论文信息 - Multimodal identification and localization of users in a smart environment

Multimodal identification and localization of users in a smart environment

Detecting the location and identity of users is a first step in creating context-aware applications for technologically-endowed environments. We propose a system that makes use of motion detection, person tracking, face identification, feature-based identification, audio-based localization, and audio-based identification modules, fusing information with particle filters to obtain robust localization and identification. The data streams are processed with the help of the generic client-server middleware SmartFlow, resulting in a flexible architecture that runs across different platforms.

[1] Lawrence Sirovich,et al. Application of the Karhunen-Loeve Procedure for the Characterization of Human Faces , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[2] Jordi Luque,et al. Speaker Diarization for Conference Room: The UPC RT07s Evaluation System , 2007, CLEAR.

[3] Albert Ali Salah,et al. Incremental mixtures of factor analysers , 2004, ICPR 2004.

[4] Xavier Anguera Miró,et al. Robust Speaker Diarization for Meetings: ICSI RT06S Meetings Evaluation System , 2006, MLMI.

[5] Larry S. Davis,et al. Joint Audio-Visual Tracking Using Particle Filters , 2002, EURASIP J. Adv. Signal Process..

[6] Michael Shapiro Brandstein,et al. A framework for speech source localization using sensor arrays , 1995 .

[7] P. Fearnhead,et al. Improved particle filter for nonlinear problems , 1999 .

[8] S. Intille,et al. Improving Multiple People Tracking Using Temporal Consistency , .

[9] Lawrence R. Rabiner,et al. A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[10] Jacob Benesty,et al. An adaptive blind SIMO identification approach to joint multichannel time delay estimation , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11] Frank Dellaert,et al. Efficient particle filter-based tracking of multiple interacting targets using an MRF-based motion model , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[12] Alex Pentland,et al. Pfinder: Real-Time Tracking of the Human Body , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[13] Rainer Stiefelhagen,et al. Multiple Object Tracking Performance Metrics and Evaluation in a Smart Room Environment , 2006 .

[14] Nikos Fakotakis,et al. Multi-speaker DOA tracking using interactive multiple models and probabilistic data association , 2003, INTERSPEECH.

[15] Michael S. Brandstein,et al. Robust Localization in Reverberant Rooms , 2001, Microphone Arrays.

[16] Martial Michel,et al. The NIST Smart Space and Meeting Room projects: signals, acquisition annotation, and metrics , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[17] Rainer Stiefelhagen,et al. The CLEAR 2006 Evaluation , 2006, CLEAR.

[18] Isaac Cohen,et al. Jeju Island , Korea TRACKING PEOPLE IN CROWDED SCENES ACROSS MULTIPLE CAMERAS , 2004 .

[19] Trevor Darrell,et al. Multiple person and speaker activity tracking with a particle filter , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[20] Pascal Fua,et al. Multicamera People Tracking with a Probabilistic Occupancy Map , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21] Xavier Anguera Miró,et al. Robust speaker diarization for meetings: ICSI RT06s evaluation system , 2006, INTERSPEECH.

[22] Alex Pentland,et al. Pfinder: real-time tracking of the human body , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[23] Hervé Bourlard,et al. Robust HMM-based speech/music segmentation , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[24] BlakeAndrew,et al. C ONDENSATION Conditional Density Propagation forVisual Tracking , 1998 .

[25] Walter F. Tichy,et al. A Communication Middleware for Smart Room Environments , 2007, AmI.

[26] L. Davis,et al. M2Tracker: A Multi-View Approach to Segmenting and Tracking People in a Cluttered Scene , 2003, International Journal of Computer Vision.

[27] Mireia Farrús,et al. Audio, Video and Multimodal Person Identification in a Smart Room , 2006, CLEAR.

[28] Paul A. Viola,et al. Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[29] Larry S. Davis,et al. W4: Real-Time Surveillance of People and Their Activities , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[30] Jean-Marc Odobez,et al. Audiovisual Probabilistic Tracking of Multiple Speakers in Meetings , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[31] Maurizio Omologo,et al. Use of the crosspower-spectrum phase in acoustic event location , 1997, IEEE Trans. Speech Audio Process..

[32] Michael Isard,et al. CONDENSATION—Conditional Density Propagation for Visual Tracking , 1998, International Journal of Computer Vision.

[33] Guillaume Gravier,et al. Experiments on speaker tracking and segmentation in radio broadcast news , 2005, INTERSPEECH.

[34] Yuan-Fang Wang,et al. Real-time multiperson tracking in video surveillance , 2003, Fourth International Conference on Information, Communications and Signal Processing, 2003 and the Fourth Pacific Rim Conference on Multimedia. Proceedings of the 2003 Joint.

[35] Stan Davis,et al. Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[36] Rama Chellappa,et al. Probabilistic recognition of human faces from video , 2002, Proceedings. International Conference on Image Processing.

[37] Glenn Fung,et al. Proximal support vector machine classifiers , 2001, KDD '01.

[38] Andrey Temko,et al. Enhanced SVM Training for Robust Speech Activity Detection , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[39] Hynek Hermansky,et al. Qualcomm-ICSI-OGI features for ASR , 2002, INTERSPEECH.

[40] W. Eric L. Grimson,et al. Adaptive background mixture models for real-time tracking , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[41] Verónica Vilaplana,et al. Face Recognition using Groups of Images in Smart Room Scenarios , 2006, 2006 International Conference on Image Processing.

[42] Arun Ross,et al. Microphone Arrays , 2009, Encyclopedia of Biometrics.

[43] Douglas A. Reynolds,et al. Approaches and applications of audio diarization , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[44] Ramakant Nevatia,et al. Segmentation and Tracking of Multiple Humans in Crowded Environments , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45] Douglas A. Reynolds,et al. A Tutorial on Text-Independent Speaker Verification , 2004, EURASIP J. Adv. Signal Process..

[46] James Black,et al. Multi view image surveillance and tracking , 2002, Workshop on Motion and Video Computing, 2002. Proceedings..

[47] Ben A. M. Schouten,et al. Transparent face recognition in an unconstrained environment using a Sparse representation from multiple still images [18th International Conference on Pattern Recognition (ICPR'06)] , 2006 .

[48] Alexander J. Smola,et al. Learning with kernels , 1998 .

[49] John W. McDonough,et al. A joint particle filter for audio-visual speaker tracking , 2005, ICMI '05.

[50] Jean-Luc Gauvain,et al. Improving Speaker Diarization , 2004 .

[51] Aristodemos Pnevmatikakis,et al. 3D Audiovisual Person Tracking Using Kalman Filtering and Information Theory , 2006, CLEAR.

[52] N. Gordon,et al. Novel approach to nonlinear/non-Gaussian Bayesian state estimation , 1993 .

[53] Larry S. Davis,et al. Multimodal 3-D tracking and event detection via the particle filter , 2001, Proceedings IEEE Workshop on Detection and Recognition of Events in Video.