Human Augmented Cognition Based on Integration of Visual and Auditory Information

In this paper, we propose a new multiple sensory fused human identification model for providing human augmented cognition. In the proposed model, both facial features and mel-frequency cepstral coefficients (MFCCs) are considered as visual features and auditory features for identifying a human, respectively. As well, an adaboosting model identifies a human using the integrated sensory features of both visual and auditory features. In the proposed model, facial form features are obtained from the principal component analysis (PCA) of a human's face area localized by an Adaboost algorithm in conjunction with a skin color preferable attention model. Moreover, MFCCs are extracted from human speech. Thus, the proposed multiple sensory integration model is aimed to enhance the performance of human identification by considering both visual and auditory complementarily working under partly distorted sensory environments. A human augmented cognition system with the proposed human identification model is implemented as a goggle type, on which it presents information such as unknown people's profile based on human identification. Experimental results show that the proposed model can plausibly conduct human identification in an indoor meeting situation.

[1]  Lindsay I. Smith,et al.  A tutorial on Principal Components Analysis , 2002 .

[2]  Lei Xie,et al.  Audio-visual human recognition using semi-supervised spectral learning and hidden Markov models , 2009, J. Vis. Lang. Comput..

[3]  Peng Dai,et al.  Artificial Intelligence for Artificial Artificial Intelligence , 2011, AAAI.

[4]  Jr. J.P. Campbell,et al.  Speaker recognition: a tutorial , 1997, Proc. IEEE.

[5]  A. A. El-Harby,et al.  Face Recognition: A Literature Review , 2008 .

[6]  Alex Acero,et al.  Spoken Language Processing: A Guide to Theory, Algorithm and System Development , 2001 .

[7]  Minho Lee,et al.  Biologically Motivated Face Selective Attention Model , 2007, ICONIP.

[8]  Yoav Freund,et al.  A Short Introduction to Boosting , 1999 .

[9]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[10]  Dylan D. Schmorrow,et al.  DARPA's Augmented Cognition Program-tomorrow's human computer interaction from vision to reality: building cognitively aware computational systems , 2002, Proceedings of the IEEE 7th Conference on Human Factors and Power Plants.

[11]  Ramiro Velazquez,et al.  Intelligent Glasses: A New Man-Machine Interface Concept Integrating Computer Vision and Human Tactile Perception , 2003 .

[12]  Simon King,et al.  Speech and Audio Signal Processing , 2011 .

[13]  Andrea F. Abate,et al.  2D and 3D face recognition: A survey , 2007, Pattern Recognit. Lett..

[14]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[15]  Minho Lee,et al.  Biologically Motivated Incremental Object Perception Based on Selective Attention , 2007, Int. J. Pattern Recognit. Artif. Intell..

[16]  Edward J. Delp,et al.  Fully automatic face recognition system using a combined audio-visual approach , 2005 .

[17]  Minho Lee,et al.  Improving AdaBoost Based Face Detection Using Face-Color Preferable Selective Attention , 2008, IDEAL.