Understanding eye movements in face recognition with hidden Markov model

Understanding eye movements in face recognition with hidden Markov model Tim Chuk (u3002534@connect.hku.hk) 1 Alvin C. W. Ng (asangfai@gmail.com) 2 Emanuele Coviello (ecoviell@ucsd.edu) 3 Antoni B. Chan (abchan@cityu.edu.hk) 2 Janet H. Hsiao (jhsiao@hku.hk) 1 Department of Psychology, The University of Hong Kong, Pokfulam Road, Hong Kong Department of Computer Science, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong Department of Electrical and Computer Engineering, Uni- versity of California San Diego, La Jolla, CA, USA Abstract ROIs, while Henderson et al. (2005) defined the two eyes as one ROI. Another problem is that the predefined ROIs may not really represent the data because different indi- viduals have different saccade patterns. More recent stud- ies attempted to discover ROIs directly from data. A commonly adopted way was to generate statistical fixa- tion maps. A fixation map can be created by identifying the location of fixations and convolving a Gaussian kernel on each fixation. Two fixation maps can be compared by Pixel test, which discovers statistically significant differ- ences in pixels (Caldara & Miellet, 2011). Using fixation maps, it was found that the upper center (i.e. the nose) and the upper left (i.e. the left half of the nose and the left eye) parts of a face were the two most frequently viewed areas (Hsiao & Cottrell, 2008). This result was consistent with an earlier study which used the Bubbles technique in dis- covering regions with diagnostic features in face recogni- tion (Gosselin & Schyns, 2001). Fixation maps also showed that children from different cultural backgrounds demonstrated different eye fixation patterns (Kelly et al, The use of fixation maps in face recognition studies had been fruitful. However, as discussed earlier, eye movements combine saccades and fixations. The fixations recorded in eye movement studies should be considered as time-series data that are collected over time. The eyes fixate at a location shortly, before a saccade brings them to the next location. Many studies showed that saccades can be influenced by top-down expectations as well as bottom-up inputs. Yarbus’s (1965) well-known eye movement studies showed that depending on what people expect to see, they exhibited different saccade patterns when looking at the same target image. Mannan et al. (1997) discovered that saccades were more likely to be driven to the more ‘informative’ areas of an image, such as the edges and the high-spatial-frequency areas. These findings imply that the target location of a saccade could be a variable that has a set of possible values; different values could be associated with different probabilities. In this sense, eye movements may be considered as a sto- chastic process, which could be better understood using time-series probabilistic models. The fixation maps, how- ever, do not contain temporal information. Currently, there are two methods for describing the tem- poral information in eye movement data. One is the string-editing method. It requires an image to be divided In this paper we propose a hidden Markov model (HMM)- based method to analyze eye movement data. We conduct- ed a simple face recognition task and recorded eye move- ments and performance of the participants. We used a vari- ational Bayesian framework for Gaussian mixture models to estimate the distribution of fixation locations and mod- eled the fixation and transition data using HMMs. We showed that using HMMs, we can describe individuals’ eye movement strategies with both fixation locations and tran- sition probabilities. By clustering these HMMs, we found that the strategies can be categorized into two subgroups; one was more holistic and the other was more analytical. Furthermore, we found that correct and wrong recognitions were associated with distinctive eye movement strategies. The difference between these strategies lied in their transi- tion probabilities. Keywords: Hidden Markov Model (HMM); eye move- ment; scan path; holistic processing; face recognition. Introduction In the late 19 th century, soon after Edmund Huey’s in- vention of the world’s first eye tracker, researchers dis- covered that in many daily life activities, eye movements were rapid, discontinuous, and interrupted by temporary fixations (Wade & Tatler, 2011). Nowadays, this finding has been widely accepted and described as the ‘saccade and fixate’ strategy (Land, 2011). Eye movements were found to facilitate face learning and recognition. For in- stance, Henderson et al. (2005) showed that when partici- pants were restricted to view face images only at the cen- ter of the images, their recognition performances were significantly lowered than when they were allowed to view the images freely. Autistic patients, who could not judge facial expressions correctly, were found to have abnormal eye fixations patterns (Pelphrey et al, 2002). Empirical studies on the relationship between eye movement and face recognition have primarily been fo- cusing on identifying the regions of interest (ROIs). A ROI is a region on the face which people frequently fixate in, such as the two eyes. Early studies often divided a face into several regions and then identified the ROI through comparing the frequencies of each region being fixated in. However, this approach suffered from the lack of an obje- ctive manner to divide faces. For instance, Barton et al. (2006) defined the two eyes as two irregularly shaped

[1]  Jason J S Barton,et al.  Information Processing during Face Recognition: The Effects of Familiarity, Inversion, and Morphing on Scanning Fixations , 2006, Perception.

[2]  G. Cottrell,et al.  Two Fixations Suffice in Face Recognition , 2008, Psychological science.

[3]  Rachael E. Jack,et al.  Social Experience Does Not Abolish Cultural Diversity in Eye Movements , 2011, Front. Psychology.

[4]  Nicholas J. Wade,et al.  Origins and applications of eye movement research , 2011 .

[5]  Sébastien Miellet,et al.  iMap: a novel method for statistical fixation mapping of eye movement data , 2011, Behavior research methods.

[6]  David Barber,et al.  Bayesian Reasoning and Machine Learning: Machine learning , 2012 .

[7]  J. Piven,et al.  Visual Scanning of Faces in Autism , 2002, Journal of autism and developmental disorders.

[8]  Antoni B. Chan,et al.  The variational hierarchical EM algorithm for clustering hidden Markov models , 2012, NIPS.

[9]  Carrick C. Williams,et al.  Eye movements are functional during face learning , 2005, Memory & cognition.

[10]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[11]  D. S. Wooding,et al.  Fixation Patterns Made during Brief Examination of Two-Dimensional Images , 1997, Perception.

[12]  J. Henderson Human gaze control during real-world scene perception , 2003, Trends in Cognitive Sciences.

[13]  Joseph H. Goldberg,et al.  Scanpath clustering and aggregation , 2010, ETRA.

[14]  Frédéric Gosselin,et al.  Bubbles: a technique to reveal the use of information in recognition tasks , 2001, Vision Research.

[15]  Michael F. Land,et al.  Oculomotor behaviour in vertebrates and invertebrates , 2011 .