Text-to-image retrieval based on incremental association via multimodal hypernetworks

Text-to-image retrieval is to retrieve the images associated with the textual queries. A text-to-image retrieval model requires an incremental learning method for its practical use since the multimodal data grow up dramatically. Here we propose an incremental text-to-image retrieval method using a multimodal association model. The association model is based on a hypernetwork (HN) where a vertex corresponds to a textual word or a visual patch and a hyperedge represents a higher-order multimodal association. Using the HN incrementally learned by a sequential Bayesian sampling, in the multimodal hypernetwork-based text-to-image retrieval, a given text query is crossmodally expanded to the visual query and then similar images are retrieved to the expanded visual query. We evaluated the proposed method using 3,000 images with textual description from Flickr.com. The experimental results present that the proposed method achieves very competitive retrieval performances compared to a baseline method. Moreover, we demonstrate that our method provides robust text-to-image retrieval results for the increasing data.

[1]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[2]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[3]  Wei-Ying Ma,et al.  A probabilistic semantic model for image annotation and multi-modal image retrieval , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[4]  R. Manmatha,et al.  Multiple Bernoulli relevance models for image and video annotation , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[5]  Byoung-Tak Zhang,et al.  Evolving hypernetworks for pattern classification , 2007, 2007 IEEE Congress on Evolutionary Computation.

[6]  James Ze Wang,et al.  Image retrieval: Ideas, influences, and trends of the new age , 2008, CSUR.

[7]  Byoung-Tak Zhang,et al.  Evolutionary layered hypernetworks for identifying microRNA-mRNA regulatory modules , 2010, IEEE Congress on Evolutionary Computation.

[8]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[10]  Byoung-Tak Zhang,et al.  Layered Hypernetwork Models for Cross-Modal Associative Text and Image Keyword Generation in Multimodal Information Retrieval , 2010, PRICAI.

[11]  Byoung-Tak Zhang,et al.  Hypernetworks: A Molecular Evolutionary Architecture for Cognitive Learning and Memory , 2008, IEEE Computational Intelligence Magazine.

[12]  Ivor W. Tsang,et al.  Text-based image retrieval using progressive multi-instance learning , 2011, 2011 International Conference on Computer Vision.