Where is my Phone?: Personal Object Retrieval from Egocentric Images

This work presents a retrieval pipeline and evaluation scheme for the problem of finding the last appearance of personal objects in a large dataset of images captured from a wearable camera. Each personal object is modelled by a small set of images that define a query for a visual search engine.The retrieved results are reranked considering the temporal timestamps of the images to increase the relevance of the later detections. Finally, a temporal interleaving of the results is introduced for robustness against false detections. The Mean Reciprocal Rank is proposed as a metric to evaluate this problem. This application could help into developing personal assistants capable of helping users when they do not remember where they left their personal belongings.

[1]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[2]  Bingbing Ni,et al.  Cascaded Interactional Targeting Network for Egocentric Video Analysis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Alan F. Smeaton,et al.  Combining image descriptors to effectively retrieve events from visual lifelogs , 2008, MIR '08.

[4]  Alan F. Smeaton,et al.  Experiences of Aiding Autobiographical Memory Using the SenseCam , 2012, Hum. Comput. Interact..

[5]  Noel E. O'Connor,et al.  Bags of Local Convolutional Features for Scalable Instance Search , 2016, ICMR.

[6]  Lei Sun,et al.  Flower Image Retrieval Based on Saliency Map , 2014, 2014 International Symposium on Computer, Consumer and Control.

[7]  C. J. van Rijsbergen,et al.  Proceedings of the 10th annual international ACM SIGIR conference on Research and development in information retrieval , 1987, SIGIR 1987.

[8]  Jorge S. Marques,et al.  Performance evaluation of object detection algorithms for video surveillance , 2006, IEEE Transactions on Multimedia.

[9]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[10]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[11]  Chaitali Chakrabarti,et al.  Lifelogging: Archival and retrieval of continuously recorded audio using wearable devices , 2012, 2012 IEEE International Conference on Emerging Signal Processing Applications.

[12]  Petia Radeva,et al.  Active labeling application applied to food-related object recognition , 2013, CEA '13.

[13]  Stefan Carlsson,et al.  Novelty detection from an ego-centric perspective , 2011, CVPR 2011.

[14]  References , 1971 .

[15]  Steve E Hodges,et al.  Wearable cameras in health: the state of the art and future possibilities. , 2013, American journal of preventive medicine.

[16]  Jake K. Aggarwal,et al.  Hierarchical Recognition of Human Activities Interacting with Objects , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Kris M. Kitani,et al.  Going Deeper into First-Person Activity Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Emmanouil Giouvanakis,et al.  Saliency map driven image retrieval combining the bag-of-words model and PLSA , 2014, 2014 19th International Conference on Digital Signal Processing.

[19]  Yoichi Sato,et al.  Recognizing Micro-Actions and Reactions from Paired Egocentric Videos , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  G. O'loughlin,et al.  Using a wearable camera to increase the accuracy of dietary analysis. , 2013, American journal of preventive medicine.

[21]  Javier Hernandez,et al.  SenseGlass: using google glass to sense daily emotions , 2014, UIST.

[22]  Alan F. Smeaton,et al.  LifeLogging: Personal Big Data , 2014, Found. Trends Inf. Retr..

[23]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[24]  Jade Goldstein-Stewart,et al.  The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries , 1998, SIGIR Forum.

[25]  Fei-Fei Li,et al.  What, where and who? Classifying events by scene and object recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[26]  James M. Rehg,et al.  A Scalable Approach to Activity Recognition based on Object Use , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[27]  Noel E. O'Connor,et al.  Shallow and Deep Convolutional Networks for Saliency Prediction , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Jean-Philippe Domenger,et al.  Geometrical Cues in Visual Saliency Models for Active Object Recognition in Egocentric Videos , 2014, PIVP@MM.

[29]  Kai Song,et al.  Diversifying the image retrieval results , 2006, MM '06.

[30]  Cathal Gurrin,et al.  The smartphone as a platform for wearable cameras in health research. , 2013, American journal of preventive medicine.

[31]  Henning Müller,et al.  Result diversification in social image retrieval: a benchmarking framework , 2014, Multimedia Tools and Applications.

[32]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[33]  Rami Albatal,et al.  NTCIR Lifelog: The First Test Collection for Lifelog Research , 2016, SIGIR.

[34]  Joo-Hwee Lim,et al.  Efficient Retrieval from Large-Scale Egocentric Visual Data Using a Sparse Graph Representation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[35]  Larry S. Davis,et al.  Observing Human-Object Interactions: Using Spatial and Functional Compatibility for Recognition , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Jenny Benois-Pineau,et al.  Geometrical cues in visual saliency models for active object recognition in egocentric videos , 2014, PIVP '14.

[37]  Nanning Zheng,et al.  Learning to Detect a Salient Object , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.