Gaze movement-driven random forests for query clustering in automatic video annotation

In the recent years, the rapid increase of the volume of multimedia content has led to the development of several automatic annotation approaches. In parallel, the high availability of large amounts of user interaction data, revealed the need for developing automatic annotation techniques that exploit the implicit user feedback during interactive multimedia retrieval tasks. In this context, this paper proposes a method for automatic video annotation by exploiting implicit user feedback during interactive video retrieval, as this is expressed with gaze movements, mouse clicks and queries submitted to a content-based video search engine. We exploit this interaction data to represent video shots with feature vectors based on aggregated gaze movements. This information is used to train a classifier that can identify shots of interest for new users. Subsequently, we propose a framework that during testing: a) identifies topics (expressed by query clusters), for which new users are searching for, based on a novel clustering algorithm and b) associates multimedia data (i.e., video shots) to the identified topics using supervised classification. The novel clustering algorithm is based on random forests and is driven by two factors: first, by the distance measures between different sets of queries and second by the homogeneity of the shots viewed during each query cluster defined by the clustering procedure; this homogeneity is inferred from the performance of the gaze-based classifier on these shots. The evaluation shows that the use of aggregated gaze data can be exploited for video annotation purposes.

[1]  Mauro Dell'Amico,et al.  8. Quadratic Assignment Problems: Algorithms , 2009 .

[2]  Gary Marchionini,et al.  Text or Pictures? An Eyetracking Study of How People View Digital Video Surrogates , 2003, CIVR.

[3]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[4]  Xiaojun Chang,et al.  CMU-informedia @ TRECViD 2014 semantic indexing , 2014 .

[5]  Hsiang-Cheh Huang,et al.  A Multiple-Instance Neural Networks based Image Content Retrieval System , 2006, First International Conference on Innovative Computing, Information and Control - Volume I (ICICIC'06).

[6]  Mauro Dell'Amico,et al.  Assignment Problems , 1998, IFIP Congress: Fundamentals - Foundations of Computer Science.

[7]  Mei Tian,et al.  A Novel Image Retrieval System with Real-Time Eye Tracking , 2014, ICIMCS '14.

[8]  Samuel Kaski,et al.  GaZIR: gaze-based zooming interface for image retrieval , 2009, ICMI-MLMI '09.

[9]  Alexandros Iosifidis,et al.  Multi-view action recognition based on action volumes, fuzzy distances and cluster discriminant analysis , 2013, Signal Process..

[10]  Xun-Fei Liu,et al.  Parallel Feature Extraction through Preserving Global and Discriminative Property for Kernel-Based Image Classification , 2015, J. Inf. Hiding Multim. Signal Process..

[11]  Yun Zhang,et al.  Content-based image retrieval using a combination of visual features and eye tracking data , 2010, ETRA '10.

[12]  Martin Halvey,et al.  Search trails using user feedback to improve video search , 2008, ACM Multimedia.

[13]  John Shawe-Taylor,et al.  Information Retrieval by Inferring Implicit Queries from Eye Movements , 2007, AISTATS.

[14]  Erik D. Reichle,et al.  Eye movements in reading and information processing : , 2015 .

[15]  Tao Mei,et al.  Online video recommendation based on multimodal fusion and relevance feedback , 2007, CIVR '07.

[16]  Jorma Laaksonen,et al.  Pinview: Implicit Feedback in Content-Based Image Retrieval , 2010, WAPA.

[17]  Ted Pedersen,et al.  WordNet::Similarity - Measuring the Relatedness of Concepts , 2004, NAACL.

[18]  Frank Hopfgartner,et al.  Evaluating the implicit feedback models for adaptive video retrieval , 2007, MIR '07.

[19]  Xudong Chai,et al.  Adaptive Sparse Kernel Principal Component Analysis for Computation and Store Space Constrained-based Feature Extraction , 2015, J. Inf. Hiding Multim. Signal Process..

[20]  H. C. Huang,et al.  Automated Information Mining on Multimedia TV News Archives , 2005, KES.

[21]  Stéphane Ayache,et al.  Video Corpus Annotation Using Active Learning , 2008, ECIR.

[22]  Ke Lu,et al.  Multiview Hessian regularized logistic regression for action recognition , 2015, Signal Process..

[23]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[24]  Thorsten Joachims,et al.  Eye-tracking analysis of user behavior in WWW search , 2004, SIGIR '04.

[25]  Kitsuchart Pasupa,et al.  Image ranking with implicit feedback from eye movements , 2010, ETRA.

[26]  Yiannis Kompatsiaris,et al.  An eye-tracking-based approach to facilitate interactive video search , 2011, ICMR '11.

[27]  K. Rayner Eye movements in reading and information processing: 20 years of research. , 1998, Psychological bulletin.

[28]  Yuan Yan Tang,et al.  Multiview Hessian discriminative sparse coding for image annotation , 2013, Comput. Vis. Image Underst..

[29]  Ophir Frieder,et al.  Automatic classification of Web queries using very large unlabeled query logs , 2007, TOIS.

[30]  Ji-Rong Wen,et al.  Query clustering using user logs , 2002, TOIS.

[31]  Filip Radlinski,et al.  Query chains: learning to rank from implicit feedback , 2005, KDD '05.

[32]  Christos Diou,et al.  Building effective SVM concept detectors from clickthrough data for large-scale image retrieval , 2015, International Journal of Multimedia Information Retrieval.

[33]  Christian Mühl,et al.  EEG analysis for implicit tagging of video data , 2009, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops.

[34]  James Bailey,et al.  Information theoretic measures for clusterings comparison: is a correction for chance necessary? , 2009, ICML '09.

[35]  Samuel Kaski,et al.  Combining eye movements and collaborative filtering for proactive information retrieval , 2005, SIGIR '05.

[36]  H. Kuhn The Hungarian method for the assignment problem , 1955 .

[37]  K. Rayner Eye movements in reading and information processing. , 1978, Psychological bulletin.

[38]  Petros Daras,et al.  Gaze-Based Relevance Feedback for Realizing Region-Based Image Retrieval , 2014, IEEE Transactions on Multimedia.

[39]  Yiannis Kompatsiaris,et al.  Utilizing Implicit User Feedback to Improve Interactive Video Retrieval , 2011, Adv. Multim..

[40]  Samuel Kaski,et al.  Can relevance of images be inferred from eye movements? , 2008, MIR '08.

[41]  Tao Mei,et al.  Correlative multi-label video annotation , 2007, ACM Multimedia.

[42]  Yiannis Kompatsiaris,et al.  Exploiting gaze movements for automatic video annotation , 2012, 2012 13th International Workshop on Image Analysis for Multimedia Interactive Services.

[43]  Yun Zhang,et al.  Eye movement as an interaction mechanism for relevance feedback in a content-based image retrieval system , 2010, ETRA.

[44]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[45]  D. Tao,et al.  Hessian-Regularized Co-Training for Social Activity Recognition , 2014, PloS one.