Photo Recall: Using the Internet to Label Your Photos

We describe a system for searching your personal photos using an extremely wide range of text queries, including dates and holidays ("Halloween"), named and categorical places ("Empire State Building" or "park"), events and occasions ("Radiohead concert" or "wedding"), activities ("skiing"), object categories ("whales"), attributes ("outdoors"), and object instances ("Mona Lisa"), and any combination of these--all with no manual labeling required. We accomplish this by correlating information in your photos--the timestamps, GPS locations, and image pixels--to information mined from the Internet. This includes matching dates to holidays listed on Wikipedia, GPS coordinates to places listed on Wikimapia, places and dates to find named events using Google, visual categories using classifiers either pre-trained on ImageNet or trained on-the-fly using results from Google Image Search, and object instances using interest point-based matching, again using results from Google Images. We tie all of these disparate sources of information together in a unified way, allowing for fast and accurate searches using whatever information you remember about a photo. We quantitatively evaluate several aspects of our system and show excellent performance in all respects. Please watch a video demonstrating our system in action on a large range of queries at http://youtu.be/Se3bemzhAiY.

[1]  Andreas Girgensohn,et al.  Temporal event clustering for digital photo collections , 2005, ACM Trans. Multim. Comput. Commun. Appl..

[2]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[3]  Shree K. Nayar,et al.  FaceTracer: A Search Engine for Large Collections of Images with Faces , 2008, ECCV.

[4]  Luc Van Gool,et al.  World-scale mining of objects and events from community photo collections , 2008, CIVR '08.

[5]  Xinlei Chen,et al.  NEIL: Extracting Visual Knowledge from Web Data , 2013, 2013 IEEE International Conference on Computer Vision.

[6]  Jason Weston,et al.  WSABIE: Scaling Up to Large Vocabulary Image Annotation , 2011, IJCAI.

[7]  Abigail Sellen,et al.  Understanding photowork , 2006, CHI.

[8]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[10]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[11]  Wei-Ying Ma,et al.  Image annotation by large-scale content-based image retrieval , 2006, MM '06.

[12]  Pietro Perona,et al.  Learning object categories from Google's image search , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[13]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[14]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[15]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[16]  Andrew Zisserman,et al.  On-the-fly specific person retrieval , 2012, 2012 13th International Workshop on Image Analysis for Multimedia Interactive Services.

[17]  James Ze Wang,et al.  Image retrieval: Ideas, influences, and trends of the new age , 2008, CSUR.

[18]  Bernt Schiele,et al.  What helps where – and why? Semantic relatedness for knowledge transfer , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[19]  Jiebo Luo,et al.  Using Geotags to Derive Rich Tag-Clouds for Image Annotation , 2011, Social Media Modeling and Computing.