Acoustic and visual signal based context awareness system for mobile application

In this paper, an acoustic and visual signal based context awareness system is proposed for a mobile application. In particular multimodal system is designed that can sense and determine, in real-time, user contextual information, such as where the user is or what the user does, by processing acoustic and visual signals from the suitable sensors available in a mobile device. A variety of contextual information, such as babble sound in cafeteria, user¿s movement, and etc., can be recognized by the proposed acoustic and visual feature extraction and classification methods. We first describe the overall structure of the proposed system and then the algorithm for each module performing detection or classification of various contextual scenarios is presented. Representative experiments demonstrate the superiority of the proposed system while the actual implementation of the proposed scheme into mobile device such as a smart-phone confirms the effectiveness and realization of the proposed system.

[1]  Satoshi Nakamura,et al.  Environmental sound source identification based on hidden Markov model for robust speech recognition , 2003, INTERSPEECH.

[2]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[3]  Ben P. Milner,et al.  Acoustic environment classification , 2006, TSLP.

[4]  Vincent Fontaine,et al.  Automatic classification of environmental noise events by hidden Markov models , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[5]  Thomas Brox,et al.  High Accuracy Optical Flow Estimation Based on a Theory for Warping , 2004, ECCV.

[6]  Martin Szummer,et al.  Indoor-outdoor image classification , 1998, Proceedings 1998 IEEE International Workshop on Content-Based Access of Image and Video Database.

[7]  Marc M. Van Hulle,et al.  A phase-based approach to the estimation of the optical flow field using spatial filtering , 2002, IEEE Trans. Neural Networks.

[8]  Pattie Maes,et al.  Situational Awareness from Environmental Sounds , 1997 .

[9]  Shengcai Liao,et al.  Learning Multi-scale Block Local Binary Patterns for Face Recognition , 2007, ICB.

[10]  P. Peer,et al.  Human skin color clustering for face detection , 2003, The IEEE Region 8 EUROCON 2003. Computer as a Tool..

[11]  Andrey Temko,et al.  Comparison of Sequence Discriminant Support Vector Machines for Acoustic Event Classification , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[12]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[13]  Guodong Guo,et al.  Content-based audio classification and retrieval by support vector machines , 2003, IEEE Trans. Neural Networks.

[14]  Ben P. Milner,et al.  Environmental Noise Classification for Context-Aware Applications , 2003, DEXA.

[15]  J.-Y. Bouguet,et al.  Pyramidal implementation of the lucas kanade feature tracker , 1999 .

[16]  Vesa T. Peltonen,et al.  Audio-based context recognition , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  Alex Pentland,et al.  Auditory Context Awareness via Wearable Computing , 1998 .