Localizing the Gaze Target of a Crowd of People

What target is focused on by many people? Analysis of the target is a crucial task, especially in a cinema, a stadium, and so on. However, it is very difficult to estimate the gaze of each person in a crowd accurately and simultaneously with existing image-based eye tracking methods, since the image resolution of each person becomes low when we capture the whole crowd with a distant camera. Therefore, we introduce a new approach for localizing the gaze target focused on by a crowd of people. The proposed framework aggregates the individually estimated results of each person’s gaze. It enables us to localize the target being focused on by them even though each person’s gaze localization from a low-resolution image is inaccurate. We analyze the effects of an aggregation method on the localization accuracy using images capturing a crowd of people in a tennis stadium under the assumption that all of the people are focusing on the same target, and also investigate the effect of the number of people involved in the aggregation on the localization accuracy. As a result, the proposed method showed the ability to improve the localization accuracy as it is applied to a larger crowd of people.

[1]  Takahiro Okabe,et al.  Gaze Estimation from Low Resolution Images , 2006, PSIVT.

[2]  Wojciech Matusik,et al.  Eye Tracking for Everyone , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Mohan M. Trivedi,et al.  Attention estimation by simultaneous analysis of viewer and view , 2014, 17th International IEEE Conference on Intelligent Transportation Systems (ITSC).

[4]  Mario Fritz,et al.  It’s Written All Over Your Face: Full-Face Appearance-Based Gaze Estimation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[5]  Matti Pietikäinen,et al.  OMEG: Oulu Multi-Pose Eye Gaze Dataset , 2015, SCIA.

[6]  Alex Fridman,et al.  Driver Gaze Region Estimation without Use of Eye Movement , 2015, IEEE Intelligent Systems.

[7]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[8]  Takashi Matsuyama,et al.  Gaze Probing: Event-Based Estimation of Objects Being Focused On , 2010, 2010 20th International Conference on Pattern Recognition.

[9]  G. Eichmann,et al.  Vector median filters , 1987 .

[10]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[11]  Richard Szeliski,et al.  Building Rome in a day , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[12]  Jean-Marc Odobez,et al.  EYEDIAP Database: Data Description and Gaze Tracking Evaluation Benchmarks , 2014 .

[13]  Jean-Marc Odobez,et al.  EYEDIAP: a database for the development and evaluation of gaze estimation algorithms from RGB and RGB-D cameras , 2014, ETRA.

[14]  Norihiro Hagita,et al.  Classification of Pedestrian Behavior in a Shopping Mall based on LRF and Camera Observations , 2011, MVA.

[15]  Mario Fritz,et al.  Appearance-based gaze estimation in the wild , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Yusuke Sugano,et al.  AggreGaze: Collective Estimation of Audience Attention on Public Displays , 2016, UIST.

[17]  Rafael Cabeza,et al.  A novel 2D/3D database with automatic face annotation for head tracking and pose estimation , 2016, Comput. Vis. Image Underst..

[18]  Jianbo Shi,et al.  Social saliency prediction , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Demetri Terzopoulos,et al.  Multilinear Analysis of Image Ensembles: TensorFaces , 2002, ECCV.

[20]  Tatsuya Kawahara,et al.  Info-concierge: Proactive multi-modal interaction through mind probing , 2011 .

[21]  Antonio Torralba,et al.  Where are they looking? , 2015, NIPS.

[22]  Yu Qiao,et al.  Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks , 2016, IEEE Signal Processing Letters.