Interactive object-based image retrieval and annotation on iPad

Apple iPad is a portable tablet computer that offers users a generic platform for consumer media including games, books, and movies. Though iPad is gaining popularity very quickly, its application in content-based image retrieval and annotation is still in its infancy. This paper aims to develop an interactive system to efficiently retrieve and annotate image objects on iPad, which mainly consists of two components of the front-end GUI (graphical user interface) and the back-end retrieval model. In the first component, an iPad-based GUI is implemented, which can provide users with an efficient way to select query objects and facilitate annotations. In the second component, we propose an object-based image retrieval algorithm that combines a novel feature descriptor based on context-preserving bags-of-words (BoW) and a two-stage re-ranking technique to measure the similarity between the query image and each image in the database. The retrieval results are returned and visualized on the iPad-based GUI, and annotations offered by users can be propagated among them. The communication between the front-end GUI and the back-end module is through the use of wireless networks. Comprehensive experiments on several benchmark datasets demonstrated the effectiveness of the proposed framework.

[1]  Bernd Girod,et al.  Outdoors augmented reality on mobile phone using loxel-based visual feature organization , 2008, MIR '08.

[2]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[3]  Silvio Savarese,et al.  Discriminative Object Class Models of Appearance and Shape by Correlatons , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[4]  Bernd Girod,et al.  Comparison of local feature descriptors for mobile visual search , 2010, 2010 IEEE International Conference on Image Processing.

[5]  Andrew Zisserman,et al.  Video Google: Efficient Visual Search of Videos , 2006, Toward Category-Level Object Recognition.

[6]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[7]  Gustavo Carneiro,et al.  Flexible Spatial Configuration of Local Image Features , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Cordelia Schmid,et al.  Bandit Algorithms for Tree Search , 2007, UAI.

[9]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[10]  Andrew Zisserman,et al.  Efficient Visual Search of Videos Cast as Text Retrieval , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Sven J. Dickinson,et al.  Using Language to Learn Structured Appearance Models for Image Annotation , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Xiaodong Wu,et al.  Optimal multiple surfaces searching for video/image resizing - a graph-theoretic approach , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[13]  Michael Isard,et al.  Bundling features for large scale partial-duplicate web image search , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Huizhong Chen,et al.  Combining image and text features: a hybrid approach to mobile book spine recognition , 2011, ACM Multimedia.

[15]  Peter H. N. de With,et al.  A Mixed-Reality System for Broadcasting Sports Video to Mobile Devices , 2011, IEEE MultiMedia.

[16]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[17]  Wenhui Li,et al.  Semantic image classification using statistical local spatial relations model , 2008, Multimedia Tools and Applications.

[18]  S. Avidan,et al.  Seam carving for content-aware image resizing , 2007, SIGGRAPH 2007.

[19]  Xin Chen,et al.  City-scale landmark identification on mobile devices , 2011, CVPR 2011.

[20]  Laura A. Dabbish,et al.  Labeling images with a computer game , 2004, AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors.

[21]  Berna Erol,et al.  HOTPAPER: multimedia interaction with paper using mobile phones , 2008, ACM Multimedia.

[22]  Gang Hua,et al.  Descriptive visual words and visual phrases for image applications , 2009, ACM Multimedia.

[23]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Xin Li Ipad for bioimage informatics , 2011 .

[25]  Konrad Tollmar,et al.  Searching the Web with mobile images for location recognition , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[26]  Dieter Schmalstieg,et al.  Pose tracking from natural features on mobile phones , 2008, 2008 7th IEEE/ACM International Symposium on Mixed and Augmented Reality.

[27]  Andrew Zisserman,et al.  A Boundary-Fragment-Model for Object Detection , 2006, ECCV.

[28]  Changhu Wang,et al.  Spatial-bag-of-features , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[29]  Bernt Schiele,et al.  Scale-Invariant Object Categorization Using a Scale-Adaptive Mean-Shift Search , 2004, DAGM-Symposium.