A Multimodal People Recognition System for an Intelligent Environment

In this paper, a multimodal system for recognizing people in intelligent environments is presented. Users are identified and tracked by detecting and recognizing voices and faces through cameras and microphones spread around the environment. This multimodal approach has been chosen to develop a flexible and cheap though reliable system, implemented through consumer electronics. Voice features are extracted through a short time spectrum analysis, while face features are extracted using the eigenfaces technique. The recognition task is achieved through the use of some Support Vector Machines, one per modality, that learn and classify the features of each person, while bindings between modalities are also learnt through a cross-anchoring learning rule based on the mutual exclusivity selection principle. The system has been developed using NMM, a middleware software capable of splitting the sensors processing in several software nodes, making the system scalable in the number of cameras and microphones.

[1]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[2]  H. Ishiguro,et al.  Multimodal joint attention through cross facilitative learning based on μX principle , 2008, 2008 7th IEEE International Conference on Development and Learning.

[3]  Douglas A. Reynolds,et al.  A Tutorial on Text-Independent Speaker Verification , 2004, EURASIP J. Adv. Signal Process..

[4]  Alessandro Saffiotti,et al.  An introduction to the anchoring problem , 2003, Robotics Auton. Syst..

[5]  Philipp Slusallek,et al.  Network-integrated multimedia middleware (NMM) , 2008, ACM Multimedia.

[6]  Emanuele Menegatti,et al.  Audio-video people recognition system for an intelligent environment , 2011, 2011 4th International Conference on Human System Interactions, HSI 2011.

[7]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[8]  Alex Pentland,et al.  Face recognition using eigenfaces , 1991, Proceedings. 1991 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[9]  Koen De Bosschere,et al.  Towards an Extensible Context Ontology for Ambient Intelligence , 2004, EUSAI.

[10]  Marco Lohse,et al.  Network integrated multimedia middleware, services, and applications , 2007 .

[11]  David V. Anderson,et al.  A Physiologically Inspired Method for Audio Classification , 2005, EURASIP J. Adv. Signal Process..

[12]  Marti A. Hearst Trends & Controversies: Support Vector Machines , 1998, IEEE Intell. Syst..