Mobile Visual Search for Digital Heritage Applications

In this chapter, we demonstrate a complete pipeline for multimedia retrieval on a mobile device. We target the use case of a tourist at a heritage site, who wishes guide herself by clicking an image of an interesting structure to get information about the same. This requires efficient mobile-based instance retrieval techniques over a dataset of 1000s of images. Such a task on mobile requires a significant reduction in the visual index size. To achieve this, we describe a set of strategies that can reduce the size of the visual index structure compared to a standard instance retrieval implementation found on desktops or servers. While our proposed reduction steps affect the overall mean Average Precision (mAP), they are able to maintain a good Precision for the top K results (\(P_K\)). We argue that for such offline application, maintaining a good \(P_K\) is sufficient. Such an instance retrieval framework depends on a well-annotated dataset of images to retrieve from. Photos from tourist and heritage sites can often be described with detailed and part-wise annotations. Manually, annotating a large community photo collection is a costly and redundant process as similar images share the same annotations. Hence, we also demonstrate an interactive web-based annotation tool that allows multiple users to add, view, edit and suggest rich annotations for images in community photo collections. Since, distinct annotations could be few, we have an easy and efficient batch annotation approach using an image similarity graph, pre-computed with instance retrieval and matching. This helps in seamlessly propagating annotations of the same objects or similar images across the entire dataset.

[1]  Wen Gao,et al.  Towards low bit rate mobile visual search with multiple-channel coding , 2011, ACM Multimedia.

[2]  Antonio Torralba,et al.  Small codes and large image databases for recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Dieter Schmalstieg,et al.  Pose tracking from natural features on mobile phones , 2008, 2008 7th IEEE/ACM International Symposium on Mixed and Augmented Reality.

[5]  Jonathan J. Hull,et al.  Icandy: a tangible user interface for itunes , 2008, CHI Extended Abstracts.

[6]  Alexei A. Efros,et al.  Scene completion using millions of photographs , 2007, SIGGRAPH 2007.

[7]  Anas Al-Nuaimi,et al.  Mobile Visual Location Recognition , 2013 .

[8]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[9]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[10]  Panu Turcot,et al.  Better matching with fewer features: The selection of useful features in large database recognition problems , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[11]  Bernd Girod,et al.  Mobile Visual Search , 2011, IEEE Signal Processing Magazine.

[12]  Bernd Girod,et al.  Quantization schemes for low bitrate Compressed Histogram of Gradients descriptors , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[13]  Michael Goesele,et al.  Multi-View Stereo for Community Photo Collections , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[14]  Niels Henze,et al.  What is That? Object Recognition from Natural Features on a Mobile Phone , 2009 .

[15]  Cordelia Schmid,et al.  Aggregating Local Image Descriptors into Compact Codes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Andrew Zisserman,et al.  Near Duplicate Image Detection: min-Hash and tf-idf Weighting , 2008, BMVC.

[17]  Oliver Bimber,et al.  PhoneGuide: museum guidance supported by on-device object recognition on mobile phones , 2005, MUM '05.

[18]  Michael S. Brown,et al.  Offline Mobile Instance Retrieval with a Small Memory Footprint , 2013, 2013 IEEE International Conference on Computer Vision.

[19]  Steven M. Seitz,et al.  Scene Segmentation Using the Wisdom of Crowds , 2008, ECCV.

[20]  Pietro Perona,et al.  Learning object categories from Google's image search , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[21]  Steven M. Seitz,et al.  Scene Summarization for Online Image Collections , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[22]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[23]  Florent Perronnin,et al.  Large-scale image retrieval with compressed Fisher vectors , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[24]  C. V. Jawahar,et al.  Heritage app: annotating images on mobile phones , 2012, ICVGIP '12.

[25]  Bernd Girod,et al.  Tree Histogram Coding for Mobile Image Matching , 2009, 2009 Data Compression Conference.

[26]  C. V. Jawahar,et al.  Efficient and Rich Annotations for Large Photo Collections , 2013, 2013 2nd IAPR Asian Conference on Pattern Recognition.

[27]  Bernd Girod,et al.  Outdoors augmented reality on mobile phone using loxel-based visual feature organization , 2008, MIR '08.

[28]  Shih-Fu Chang,et al.  Mobile product search with Bag of Hash Bits and boundary reranking , 2012, CVPR.

[29]  Steven M. Seitz,et al.  Photo tourism: exploring photo collections in 3D , 2006, ACM Trans. Graph..

[30]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[31]  Jiri Matas,et al.  Geometric min-Hashing: Finding a (thick) needle in a haystack , 2009, CVPR.

[32]  Luc Van Gool,et al.  I know what you did last summer: object-level auto-annotation of holiday snaps , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[33]  C. V. Jawahar,et al.  Optimizing Storage Intensive Vision Applications to Device Capacity , 2014, ACCV.

[34]  Alexei A. Efros,et al.  IM2GPS: estimating geographic information from a single image , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Bernd Girod,et al.  Low-rate image retrieval with tree histogram coding , 2009, Mobimedia 2009.

[36]  Cordelia Schmid,et al.  Packing bag-of-features , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[37]  Bernd Girod,et al.  Compressed Histogram of Gradients: A Low-Bitrate Descriptor , 2011, International Journal of Computer Vision.

[38]  Steven M. Seitz,et al.  Finding paths through the world's photos , 2008, SIGGRAPH 2008.