Automated Image Annotation With Novel Features Based on Deep ResNet50-SLT

Due to their vast size, the growing number of digital images found in personal archives and on websites has become unmanageable, making it challenging to retrieve images from these large databases accurately. While these collections are popular due to their convenience, they often need to be equipped with proper indexing information, making it difficult for users to find what they need. One of the most significant challenges in computer vision and multimedia is image annotation, which involves labeling images with descriptive keywords. However, computers need to possess the capability to understand the essence of images in the same way that humans do, and people can only identify images based on their visual attributes rather than their deeper semantic meaning. Therefore, image annotation requires keywords to effectively communicate the contents of an image to a computer system. However, raw pixels in an image need to provide more information to generate semantic concepts, making image annotation a complex task. Unlike text annotation, where the dictionary linking words to semantics is well established, image annotation lacks a clear definition of “words” or “sentences” that can be associated with the meaning of the image, known as the semantic gap. To address this challenge, this study aimed to characterize image content meaningfully to make information retrieval easier. An improved automatic image annotation (AIA) system was proposed to bridge the semantic gap between low-level computer features and human interpretation of images by assigning one or multiple labels to images. The proposed AIA system can convert raw image pixels into semantic-level concepts, providing a clearer representation of the image content. The study combined the ResNet50 and slantlet transform with word2vec and principal component analysis with t-distributed stochastic neighbor embedding to balance precision and recall. This allowed the researchers to determine the optimal model for the proposed ResNet50-SLT AIA framework. A Word2vec model with ResNet50-SLT was used with principal component analysis and t-distributed stochastic neighbor embedding to improve IA prediction accuracy. The distributed representation approach involved encoding and storing information about image features. The proposed AIA system utilized seq2seq to generate sentences depending on feature vectors. The system was implemented on the most popular datasets (Flickr8k, Corel-5k, ESP-Game). The results showed that the newly developed AIA scheme overcame the computational time complexity associated with most existing image annotation models during the training phase for large datasets. The performance evaluation of the AIA scheme showed its excellent flexibility of annotation, improved accuracy, and reduced computational costs, thus outperforming the existing state-of-the-art methods. In conclusion, this AIA framework can provide immense benefits in accurately selecting and extracting image features and easily retrieving images from large databases. The extracted features can effectively be used to represent the image, thus accelerating the annotation process and minimizing the computational complexity.

[1]  A. Khan Improved multi-lingual sentiment analysis and recognition using deep learning , 2023, Journal of Information Science.

[2]  Fawziya M. Rammo,et al.  Detecting The Speaker Language Using CNN Deep Learning Algorithm , 2022, Iraqi Journal for Computer Science and Mathematics.

[3]  Suliman Mohamed Fati,et al.  An Improved Automatic Image Annotation Approach using Convolutional Neural Network-Slantlet Transform , 2022, IEEE Access.

[4]  Shengchang Ji,et al.  ID-Net: an improved mask R-CNN model for intrusion detection under power grid surveillance , 2021, Neural Computing and Applications.

[5]  Jianfang Cao,et al.  Automatic image annotation method based on a convolutional neural network with threshold optimization , 2020, PloS one.

[6]  Muhammad Hamza Bhatti,et al.  Classification of Skin Cancer Dermoscopy Images using Transfer Learning , 2019, 2019 15th International Conference on Emerging Technologies (ICET).

[7]  E. Topol,et al.  A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. , 2019, The Lancet. Digital health.

[8]  Xuanjing Huang,et al.  CNN-Based Chinese NER with Lexicon Rethinking , 2019, IJCAI.

[9]  Tanzila Saba,et al.  Convolution, Approximation and Spatial Information Based Object and Color Signatures for Content Based Image Retrieval , 2019, 2019 International Conference on Computer and Information Sciences (ICCIS).

[10]  Amjad Rehman,et al.  Content-based image retrieval: a deep look at features prospectus , 2019, Int. J. Comput. Vis. Robotics.

[11]  Zahid Mehmood,et al.  A novel method for content-based image retrieval to improve the effectiveness of the bag-of-words model using a support vector machine , 2018, J. Inf. Sci..

[12]  Qi Tian,et al.  Social Anchor-Unit Graph Regularized Tensor Completion for Large-Scale Image Retagging , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Qing Xie,et al.  CNN-feature based automatic image annotation method , 2019, Multimedia Tools and Applications.

[14]  Zahid Mehmood,et al.  Effect of complementary visual words versus complementary features on clustering for effective content-based image search , 2018, J. Intell. Fuzzy Syst..

[15]  Mudassar Raza,et al.  Fundus image classification methods for the detection of glaucoma: A review , 2018, Microscopy research and technique.

[16]  Qian Zhang,et al.  A survey and analysis on automatic image annotation , 2018, Pattern Recognit..

[17]  T. Saba,et al.  Scene analysis and search using local features and support vector machine for effective content-based image retrieval , 2018, Artificial Intelligence Review.

[18]  Zahid Mehmood,et al.  A Novel Technique Based on Visual Words Fusion Analysis of Sparse Features for Effective Content-Based Image Retrieval , 2018 .

[19]  T. Saba,et al.  Image Enhancement and Segmentation Techniques for Detection of Knee Joint Diseases: A Survey , 2017, Current Medical Imaging Reviews.

[20]  Ching-Hsien Hsu,et al.  Deep learning based feature representation for automated skin histopathological image annotation , 2018, Multimedia Tools and Applications.

[21]  Qing Liao,et al.  Cancer classification with multi-task deep learning , 2017, 2017 International Conference on Security, Pattern Analysis, and Cybernetics (SPAC).

[22]  Meng Wang,et al.  Tri-Clustered Tensor Completion for Social-Aware Image Tag Refinement , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Alberto Del Bimbo,et al.  Automatic image annotation via label transfer in the semantic space , 2016, Pattern Recognit..

[24]  Amjad Rehman,et al.  3D bones segmentation based on CT images visualization , 2017 .

[25]  Dzulkifli Mohamad,et al.  Machine aided malaria parasitemia detection in Giemsa-stained thin blood smears , 2018, Neural Computing and Applications.

[26]  R. Manmatha,et al.  Image Annotation using Multi-scale Hypergraph Heat Diffusion Framework , 2016, ICMR.

[27]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[29]  Amjad Rehman,et al.  Digital Watermarking for Images Security using Discrete Slantlet Transform , 2014 .

[30]  T. Saba,et al.  Annotated comparisons of proposed preprocessing techniques for script recognition , 2014, Neural Computing and Applications.

[31]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[32]  Xiang-yang Wang,et al.  Content-based image retrieval by integrating color and texture features , 2014, Multimedia Tools and Applications.

[33]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[34]  C. V. Jawahar,et al.  Image Annotation Using Metric Learning in Semantic Neighbourhoods , 2012, ECCV.

[35]  T. Saba,et al.  Off-line cursive script recognition: current advances, comparisons and remaining problems , 2012, Artificial Intelligence Review.

[36]  Ghazali Sulong,et al.  An intelligent approach to image denoising , 2010 .

[37]  Wensheng Zou,et al.  Content-Based Image Retrieval using color and edge direction features , 2010, 2010 2nd International Conference on Advanced Computer Control.

[38]  Cordelia Schmid,et al.  TagProp: Discriminative metric learning in nearest neighbor models for image auto-annotation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[39]  Vladimir Pavlovic,et al.  A New Baseline for Image Annotation , 2008, ECCV.

[40]  Laura A. Dabbish,et al.  Labeling images with a computer game , 2004, AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors.

[41]  I. Selesnick The slantlet transform , 1998, Proceedings of the IEEE-SP International Symposium on Time-Frequency and Time-Scale Analysis (Cat. No.98TH8380).

[42]  Maria Petrou,et al.  Multidimensional Co-occurrence Matrices for Object Recognition and Matching , 1996, CVGIP Graph. Model. Image Process..