A Content-Based Image Retrieval Method Based on the Google Cloud Vision API and WordNet

Content-Based Image Retrieval (CBIR) method analyzes the content of an image and extracts the features to describe images, also called the image annotations (or called image labels). A machine learning (ML) algorithm is commonly used to get the annotations, but it is a time-consuming process. In addition, the semantic gap is another problem in image labeling. To overcome the first difficulty, Google Cloud Vision API is a solution because it can save much computational time. To resolve the second problem, a transformation method is defined for mapping the undefined terms by using the WordNet. In the experiments, a well-known dataset, Pascal VOC 2007, with 4952 testing figures is used and the Cloud Vision API on image labeling implemented by R language, called Cloud Vision API. At most ten labels of each image if the scores are over 50. Moreover, we compare the Cloud Vision API with well-known ML algorithms. This work found this API yield 42.4% mean average precision (mAP) among the 4,952 images. Our proposed approach is better than three well-known ML algorithms. Hence, this work could be extended to test other image datasets and as a benchmark method while evaluating the performances.

[1]  Joseph J. Lim,et al.  Sketch Tokens: A Learned Mid-level Representation for Contour and Object Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Fei-Fei Li,et al.  Building and using a semantivisual image hierarchy , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[3]  Zhiwu Lu,et al.  Learning descriptive visual representation for image classification and annotation , 2015, Pattern Recognit..

[4]  Chitra Dorai,et al.  Bridging the semantic gap with computational media aesthetics , 2003, IEEE MultiMedia.

[5]  Meng Wang,et al.  Learning Visual Semantic Relationships for Efficient Visual Retrieval , 2015, IEEE Transactions on Big Data.

[6]  ObjectPatchNet: Towards scalable and semantic image annotation and retrieval , 2014, Comput. Vis. Image Underst..

[7]  Miguel Ángel Rodríguez-García,et al.  Ontology-based annotation and retrieval of services in the cloud , 2014, Knowl. Based Syst..

[8]  Chien-Li Chou,et al.  Effective Semantic Annotation by Image-to-Concept Distribution Model , 2011, IEEE Transactions on Multimedia.

[9]  Geun-Duk Park,et al.  Linked tag: image annotation using semantic relationships between image tags , 2014, Multimedia Tools and Applications.

[10]  Mária Bieliková,et al.  ANNOR: Efficient image annotation based on combining local and global features , 2015, Comput. Graph..

[11]  R. Manmatha,et al.  A Model for Learning the Semantics of Pictures , 2003, NIPS.

[12]  Ying Liu,et al.  A survey of content-based image retrieval with high-level semantics , 2007, Pattern Recognit..

[13]  Dan Schonfeld,et al.  Object Trajectory-Based Activity Classification and Recognition Using Hidden Markov Models , 2007, IEEE Transactions on Image Processing.

[14]  James Ze Wang,et al.  Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[16]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[17]  Jianping Fan,et al.  Automatic Abstract Tag Detection for Social Image Tag Refinement and Enrichment , 2014, J. Signal Process. Syst..

[18]  Stefan Poslad,et al.  An Enhanced Bag-of-Visual Word Vector Space Model to Represent Visual Content in Athletics Images , 2012, IEEE Transactions on Multimedia.

[19]  Gerald Schaefer,et al.  Utilising semantic technologies for intelligent indexing and retrieval of digital images , 2014, Computing.

[20]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[21]  Rong Jin,et al.  Learning to Rank Image Tags With Limited Training Examples , 2015, IEEE Transactions on Image Processing.

[22]  R. Manmatha,et al.  Multiple Bernoulli relevance models for image and video annotation , 2004, CVPR 2004.

[23]  Deva Ramanan,et al.  Histograms of Sparse Codes for Object Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Jing-Yu Yang,et al.  Content-based image retrieval using color difference histogram , 2013, Pattern Recognit..

[25]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.

[27]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Chong-Wah Ngo,et al.  Click-through-based cross-view learning for image search , 2014, SIGIR.

[29]  Subrahmanyam Murala,et al.  Local Tetra Patterns: A New Feature Descriptor for Content-Based Image Retrieval , 2012, IEEE Transactions on Image Processing.

[30]  Wei-Ying Ma,et al.  Recent Advances and Challenges of Semantic Image/Video Search , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[31]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[32]  M. Shamim Hossain,et al.  Folksonomy-Based Visual Ontology Construction and Its Applications , 2016, IEEE Transactions on Multimedia.

[33]  Stefan Poslad,et al.  A Multi-Modal Incompleteness Ontology model (MMIO) to enhance information fusion for image retrieval , 2014, Inf. Fusion.

[34]  Xian-Sheng Hua,et al.  Bridging the Semantic Gap via Functional Brain Imaging , 2012, IEEE Transactions on Multimedia.

[35]  Md. Iqbal Hasan Sarker,et al.  Content-based Image Retrieval Using Haar Wavelet Transform and Color Moment , 2013, Smart Comput. Rev..

[36]  M. Shamim Hossain,et al.  Learning Feature Hierarchies: A Layer-Wise Tag-Embedded Approach , 2015, IEEE Transactions on Multimedia.

[37]  Mário J. Silva,et al.  The epidemiology ontology: an ontology for the semantic annotation of epidemiological resources , 2014, Journal of Biomedical Semantics.