Combining global and local matching of multiple features for precise item image retrieval

With the fast-growing of online shopping services, there are millions even billions of commercial item images available on the Internet. How to effectively leverage visual search method to find the items of users’ interests is an important yet challenging task. Besides global appearances (e.g., color, shape or pattern), users may often pay more attention to the local styles of certain products, thus an ideal visual item search engine should support detailed and precise search of similar images, which is beyond the capabilities of current search systems. In this paper, we propose a novel system named iSearch and global/local matching of local features are combined to do precise retrieval of item images in an interactive manner. We extract multiple local features including scale-invariant feature transform (SIFT), regional color moments and object contour fragments to sufficiently represent the visual appearances of items; while global and local matching of large-scale image dataset are allowed. To do this, an effective contour fragments encoding and indexing method is developed. Meanwhile, to improve the matching robustness of local features, we encode the spatial context with grid representations and a simple but effective verification approach using triangle relations constraints is proposed for spatial consistency filtering. The experimental evaluations show the promising results of our approach and system.

[1]  Chong-Wah Ngo,et al.  On the Annotation of Web Videos by Efficient Near-Duplicate Search , 2010, IEEE Transactions on Multimedia.

[2]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[3]  Cordelia Schmid,et al.  A Performance Evaluation of Local Descriptors , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Gustavo Carneiro,et al.  Flexible Spatial Configuration of Local Image Features , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Bo Zhang,et al.  An efficient and effective region-based image retrieval framework , 2004, IEEE Transactions on Image Processing.

[6]  Jun Li,et al.  Dense SIFT and Gabor descriptors-based face representation with applications to gender recognition , 2010, 2010 11th International Conference on Control Automation Robotics & Vision.

[7]  B. S. Manjunath,et al.  Color image segmentation , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[8]  Dariu Gavrila,et al.  Multi-feature hierarchical template matching using distance transforms , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[9]  Shi-Min Hu,et al.  Global contrast based salient region detection , 2011, CVPR 2011.

[10]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Changhai Xu,et al.  Object Detection Using Principal Contour Fragments , 2011, 2011 Canadian Conference on Computer and Robot Vision.

[12]  Claudio Gutierrez,et al.  Survey of graph database models , 2008, CSUR.

[13]  Tat-Seng Chua,et al.  Image Annotation by Graph-Based Inference With Integrated Multiple/Single Instance Representations , 2010, IEEE Transactions on Multimedia.

[14]  Meng Wang,et al.  Interactive Video Annotation by Multi-Concept Multi-Modality Active Learning , 2007, Int. J. Semantic Comput..

[15]  M. Shyu,et al.  Florida International University and University of Miami TRECVID 2008 - High Level Feature Extraction , 2008, TRECVID.

[16]  Yongdong Zhang,et al.  Automatic Detection and Analysis of Player Action in Moving Background Sports Video Sequences , 2010, IEEE Transactions on Circuits and Systems for Video Technology.

[17]  Haojie Li,et al.  iSearch: towards precise retrieval of item image , 2011, ICIMCS '11.

[18]  Ian H. Witten,et al.  Managing gigabytes (2nd ed.): compressing and indexing documents and images , 1999 .

[19]  Meng Wang,et al.  Active learning in multimedia annotation and retrieval: A survey , 2011, TIST.

[20]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[21]  Michael Isard,et al.  Bundling features for large scale partial-duplicate web image search , 2009, CVPR.

[22]  Andrew Blake,et al.  Multiscale Categorical Object Recognition Using Contour Fragments , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Shuicheng Yan,et al.  Inferring semantic concepts from community-contributed images and noisy tags , 2009, ACM Multimedia.

[24]  James Ze Wang,et al.  Image retrieval: Ideas, influences, and trends of the new age , 2008, CSUR.

[25]  Qi Tian,et al.  Spatial coding for large scale partial-duplicate web image search , 2010, ACM Multimedia.

[26]  Tat-Seng Chua,et al.  Word2Image: towards visual interpreting of words , 2008, ACM Multimedia.

[27]  Jitendra Malik,et al.  Shape matching and object recognition using shape contexts , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[28]  Ian H. Witten,et al.  Managing Gigabytes: Compressing and Indexing Documents and Images , 1999 .

[29]  Hai Jin,et al.  Nonparametric Label-to-Region by search , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[30]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[31]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[32]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[33]  Qingming Huang,et al.  Adding Affine Invariant Geometric Constraint for Partial-Duplicate Image Retrieval , 2010, 2010 20th International Conference on Pattern Recognition.

[34]  Tat-Seng Chua,et al.  Word 2 Image : Towards Visual Interpretation of Words , 2008 .