论文信息 - A Content-Based Image Retrieval Method Based on the Google Cloud Vision API and WordNet

A Content-Based Image Retrieval Method Based on the Google Cloud Vision API and WordNet

Content-Based Image Retrieval (CBIR) method analyzes the content of an image and extracts the features to describe images, also called the image annotations (or called image labels). A machine learning (ML) algorithm is commonly used to get the annotations, but it is a time-consuming process. In addition, the semantic gap is another problem in image labeling. To overcome the first difficulty, Google Cloud Vision API is a solution because it can save much computational time. To resolve the second problem, a transformation method is defined for mapping the undefined terms by using the WordNet. In the experiments, a well-known dataset, Pascal VOC 2007, with 4952 testing figures is used and the Cloud Vision API on image labeling implemented by R language, called Cloud Vision API. At most ten labels of each image if the scores are over 50. Moreover, we compare the Cloud Vision API with well-known ML algorithms. This work found this API yield 42.4% mean average precision (mAP) among the 4,952 images. Our proposed approach is better than three well-known ML algorithms. Hence, this work could be extended to test other image datasets and as a benchmark method while evaluating the performances.

Shih-Hsin Chen | Yi-Hui Chen

[1] Joseph J. Lim,et al. Sketch Tokens: A Learned Mid-level Representation for Contour and Object Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[2] Fei-Fei Li,et al. Building and using a semantivisual image hierarchy , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[3] Zhiwu Lu,et al. Learning descriptive visual representation for image classification and annotation , 2015, Pattern Recognit..

[4] Chitra Dorai,et al. Bridging the semantic gap with computational media aesthetics , 2003, IEEE MultiMedia.

[5] Meng Wang,et al. Learning Visual Semantic Relationships for Efficient Visual Retrieval , 2015, IEEE Transactions on Big Data.

[6] ObjectPatchNet: Towards scalable and semantic image annotation and retrieval , 2014, Comput. Vis. Image Underst..

[7] Miguel Ángel Rodríguez-García,et al. Ontology-based annotation and retrieval of services in the cloud , 2014, Knowl. Based Syst..

[8] Chien-Li Chou,et al. Effective Semantic Annotation by Image-to-Concept Distribution Model , 2011, IEEE Transactions on Multimedia.

[9] Geun-Duk Park,et al. Linked tag: image annotation using semantic relationships between image tags , 2014, Multimedia Tools and Applications.

[10] Mária Bieliková,et al. ANNOR: Efficient image annotation based on combining local and global features , 2015, Comput. Graph..

[11] R. Manmatha,et al. A Model for Learning the Semantics of Pictures , 2003, NIPS.

[12] Ying Liu,et al. A survey of content-based image retrieval with high-level semantics , 2007, Pattern Recognit..

[13] Dan Schonfeld,et al. Object Trajectory-Based Activity Classification and Recognition Using Hidden Markov Models , 2007, IEEE Transactions on Image Processing.

[14] James Ze Wang,et al. Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[15] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.

[16] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[17] Jianping Fan,et al. Automatic Abstract Tag Detection for Social Image Tag Refinement and Enrichment , 2014, J. Signal Process. Syst..

[18] Stefan Poslad,et al. An Enhanced Bag-of-Visual Word Vector Space Model to Represent Visual Content in Athletics Images , 2012, IEEE Transactions on Multimedia.

[19] Gerald Schaefer,et al. Utilising semantic technologies for intelligent indexing and retrieval of digital images , 2014, Computing.

[20] George A. Miller,et al. WordNet: A Lexical Database for English , 1995, HLT.

[21] Rong Jin,et al. Learning to Rank Image Tags With Limited Training Examples , 2015, IEEE Transactions on Image Processing.

[22] R. Manmatha,et al. Multiple Bernoulli relevance models for image and video annotation , 2004, CVPR 2004.

[23] Deva Ramanan,et al. Histograms of Sparse Codes for Object Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[24] Jing-Yu Yang,et al. Content-based image retrieval using color difference histogram , 2013, Pattern Recognit..

[25] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[26] Yihong Gong,et al. Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.

[27] David A. McAllester,et al. Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28] Chong-Wah Ngo,et al. Click-through-based cross-view learning for image search , 2014, SIGIR.

[29] Subrahmanyam Murala,et al. Local Tetra Patterns: A New Feature Descriptor for Content-Based Image Retrieval , 2012, IEEE Transactions on Image Processing.

[30] Wei-Ying Ma,et al. Recent Advances and Challenges of Semantic Image/Video Search , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[31] Pietro Perona,et al. Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[32] M. Shamim Hossain,et al. Folksonomy-Based Visual Ontology Construction and Its Applications , 2016, IEEE Transactions on Multimedia.

[33] Stefan Poslad,et al. A Multi-Modal Incompleteness Ontology model (MMIO) to enhance information fusion for image retrieval , 2014, Inf. Fusion.

[34] Xian-Sheng Hua,et al. Bridging the Semantic Gap via Functional Brain Imaging , 2012, IEEE Transactions on Multimedia.

[35] Md. Iqbal Hasan Sarker,et al. Content-based Image Retrieval Using Haar Wavelet Transform and Color Moment , 2013, Smart Comput. Rev..

[36] M. Shamim Hossain,et al. Learning Feature Hierarchies: A Layer-Wise Tag-Embedded Approach , 2015, IEEE Transactions on Multimedia.

[37] Mário J. Silva,et al. The epidemiology ontology: an ontology for the semantic annotation of epidemiological resources , 2014, Journal of Biomedical Semantics.