Egocentric Video Search via Physical Interactions

Retrieving past egocentric videos about personal daily life is important to support and augment human memory. Most previous retrieval approaches have ignored the crucial feature of human-physical world interactions, which is greatly related to our memory and experience of daily activities. In this paper, we propose a gesture-based egocentric video retrieval framework, which retrieves past visual experience using body gestures as non-verbal queries. We use a probabilistic framework based on a canonical correlation analysis that models physical interactions through a latent space and uses them for egocentric video retrieval and re-ranking search results. By incorporating physical interactions into the retrieval models, we address the problems resulting from the variability of human motions. We evaluate our proposed method on motion and egocentric video datasets about daily activities in household settings and demonstrate that our egocentric video retrieval framework robustly improves retrieval performance when retrieving past videos from personal and even other persons' video archives.

[1]  L L Light,et al.  Direct and indirect measures of memory for modality in young and older adults. , 1992, Journal of experimental psychology. Learning, memory, and cognition.

[2]  Yong Jae Lee,et al.  Discovering important people and objects for egocentric video summarization , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Yasuo Kuniyoshi,et al.  Evaluation of dimensionality reduction methods for image auto-annotation , 2010, BMVC.

[4]  Cordelia Schmid,et al.  TagProp: Discriminative metric learning in nearest neighbor models for image auto-annotation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[5]  Gordon Bell,et al.  A personal digital store , 2001, CACM.

[6]  Wei Chen,et al.  Jointly Modeling Deep Video and Compositional Text to Bridge Vision and Language in a Unified Framework , 2015, AAAI.

[7]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Ali Farhadi,et al.  Understanding egocentric activities , 2011, 2011 International Conference on Computer Vision.

[9]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[10]  Michael I. Jordan,et al.  A Probabilistic Interpretation of Canonical Correlation Analysis , 2005 .

[11]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[12]  Abigail Sellen,et al.  Now let me see where i was: understanding how lifelogs mediate memory , 2010, CHI.

[13]  Steve Mann,et al.  Wearable Computing: A First Step Toward Personal Imaging , 1997, Computer.

[14]  Jimmy J. Lin,et al.  Temporal feedback for tweet search with non-parametric density estimation , 2014, SIGIR.

[15]  Yong Jae Lee,et al.  Predicting Important Objects for Egocentric Video Summarization , 2015, International Journal of Computer Vision.

[16]  Yasuo Kuniyoshi,et al.  Efficient multi-modal retrieval in conceptual space , 2011, MM '11.

[17]  Kiyoharu Aizawa,et al.  Context-based video retrieval system for the life-log applications , 2003, MIR '03.

[18]  N Butters,et al.  Differential impairment of semantic and episodic memory in Alzheimer's and Huntington's diseases: a controlled prospective study. , 1990, Journal of neurology, neurosurgery, and psychiatry.

[19]  Shahram Izadi,et al.  SenseCam: A Retrospective Memory Aid , 2006, UbiComp.

[20]  Ling Bao,et al.  Activity Recognition from User-Annotated Acceleration Data , 2004, Pervasive.

[21]  Kristen Grauman,et al.  Story-Driven Summarization for Egocentric Video , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Joo-Hwee Lim,et al.  Incremental Graph Clustering for Efficient Retrieval from Streaming Egocentric Video Data , 2014, 2014 22nd International Conference on Pattern Recognition.

[23]  W. Bruce Croft,et al.  Latent concept expansion using markov random fields , 2007, SIGIR.

[24]  Steve Hodges,et al.  Neuropsychological Rehabilitation , 2013 .

[25]  Deva Ramanan,et al.  Detecting activities of daily living in first-person camera views , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  R. Manmatha,et al.  Automatic image annotation and retrieval using cross-media relevance models , 2003, SIGIR.

[27]  Yasuo Kuniyoshi,et al.  AI Goggles: Real-time Description and Retrieval in the Real World with Online Learning , 2009, 2009 Canadian Conference on Computer and Robot Vision.

[28]  Gordon Bell,et al.  MyLifeBits: a personal database for everything , 2006, CACM.

[29]  Abigail Sellen,et al.  Do life-logging technologies support memory for the past?: an experimental study using sensecam , 2007, CHI.