Automated classification of gaze direction using spectral regression and support vector machine

This paper presents a framework to automatically estimate the gaze direction of an infant in an infant-parent face-to-face interaction. Commercial devices are sometimes used to produce automated measurement of the subjects' gaze direction. This approach is intrusive, requiring cooperation from the participants, and cannot be employed in interactive face-to-face communication scenarios between a parent and their infant. Alternately, the infant gazes that are at and away from the parent's face may be manually coded from captured videos by a human expert. However, this approach is labor intensive. A preferred alternative would be to automatically estimate the gaze direction of participants from captured videos. The realization of a such a system will help psychological scientists to readily study and understand the early attention of infants. One of the problems in eye region image analysis is the large dimensionality of the visual data. We address this problem by employing the spectral regression technique to project high dimensionality eye region images into a low dimensional sub-space. Represented eye region images in the low dimensional sub-space are utilized to train a Support Vector Machine (SVM) classifier to predict the gaze direction (i.e., either looking at parent's face or looking away from parent's face). The analysis of more than 39,000 video frames of naturalistic gaze shifts of multiple infants demonstrates significant agreement between a human coder and our approach. These results indicate that the proposed system provides an efficient approach to automating the estimation of gaze direction of naturalistic gaze shifts.

[1]  Jiawei Han,et al.  Regularized locality preserving indexing via spectral regression , 2007, CIKM '07.

[2]  Andrew T. Duchowski,et al.  Eye Tracking Methodology: Theory and Practice , 2003, Springer London.

[3]  Alexander J. Smola,et al.  Support Vector Method for Function Approximation, Regression Estimation and Signal Processing , 1996, NIPS.

[4]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[5]  John Daugman,et al.  High Confidence Visual Recognition of Persons by a Test of Statistical Independence , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Takeo Kanade,et al.  Facial Expression Analysis , 2011, AMFG.

[7]  Yun Fu,et al.  Image-Based Human Age Estimation by Manifold Learning and Locally Adjusted Robust Regression , 2008, IEEE Transactions on Image Processing.

[8]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[9]  David G. Lowe,et al.  Local feature view clustering for 3D object recognition , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[10]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[11]  Peng Zhang,et al.  Nonlinear Dimensionality Reduction by Locally Linear Inlaying , 2009, IEEE Transactions on Neural Networks.

[12]  L. Young,et al.  Survey of eye movement recording methods , 1975 .

[13]  Simon Baker,et al.  Active Appearance Models Revisited , 2004, International Journal of Computer Vision.

[14]  Shumeet Baluja,et al.  Non-Intrusive Gaze Tracking Using Artificial Neural Networks , 1993, NIPS.

[15]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[16]  Jean-Marc Odobez,et al.  Visual activity context for focus of attention estimation in dynamic meetings , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[17]  Jiawei Han,et al.  Orthogonal Laplacianfaces for Face Recognition , 2006, IEEE Transactions on Image Processing.

[18]  Yuxiao Hu,et al.  Face recognition using Laplacianfaces , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[20]  Jean-Marc Odobez,et al.  A Cognitive and Unsupervised Map Adaptation Approach to the Recognition of the Focus of Attention from Head Pose , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[21]  Junji Yamato,et al.  A probabilistic inference of multiparty-conversation structure based on Markov-switching models of gaze patterns, head directions, and utterances , 2005, ICMI '05.

[22]  Fred L. Bookstein,et al.  Principal Warps: Thin-Plate Splines and the Decomposition of Deformations , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  Narendra Ahuja,et al.  Appearance-based eye gaze estimation , 2002, Sixth IEEE Workshop on Applications of Computer Vision, 2002. (WACV 2002). Proceedings..

[24]  Timothy F. Cootes,et al.  Active Shape Models-Their Training and Application , 1995, Comput. Vis. Image Underst..

[25]  Mark Sheskin,et al.  Visual disengagement in the infant siblings of children with an autism spectrum disorder (ASD) , 2008, Autism : the international journal of research and practice.

[26]  Fernando De la Torre,et al.  Facial Expression Analysis , 2011, Visual Analysis of Humans.

[27]  Jian-Gang Wang,et al.  Study on eye gaze estimation , 2002, IEEE Trans. Syst. Man Cybern. Part B.

[28]  Daniel S. Messinger,et al.  A framework for automated measurement of the intensity of non-posed Facial Action Units , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.