In Search of a Suitable Epoch Value for the Optimum Result in Image Captioning

The proposed work is carried out on retrieving the content of an image using machine learning based approach. In order to gain a deeper insight of this topic, the current state-of-the-art image caption generation technique has been considered as the baseline. The goal of this work is to find a suitable Epoch value during the training phase which can maximize the performance of a system. The proposed model is composed of six major components: 1) Data pre-processing unit; 2) Convolutional Neural Network (CNN) as an encoder; 3) At-tention mechanism; 4) Recurrent Neural Network (RNN) as a decoder; 5) Beam search to find most optimal caption; and 6) Sentence Generation and evaluation. The model is trained to maximize the likelihood of the description sentence of a given training image. Experiments on the test dataset show the accuracy of the model and the fluency of the language it learns solely from image descriptions. The challenges of this work are mentioned in detail at the last of this paper.

[1]  Raghunath Dey,et al.  A Script Independent Hybrid Feature Extraction Technique for Offline Handwritten Devanagari and Bangla Character Recognition , 2021, IEEE India Conference.

[2]  A. V. Kameswari Image Caption Generator Using Deep Learning , 2021, International Journal for Research in Applied Science and Engineering Technology.

[3]  Debabrata Singh,et al.  Offline Natural Scene Character Recognition Using VGG16 Neural Networks , 2021, 2021 Third International Conference on Inventive Research in Computing Applications (ICIRCA).

[4]  Rachel Calvin,et al.  Image Captioning using Convolutional Neural Networks and Recurrent Neural Network , 2021, 2021 6th International Conference for Convergence in Technology (I2CT).

[5]  Ravi Kumar Mishra,et al.  Image Captioning: A Comprehensive Survey , 2020, 2020 International Conference on Power Electronics & IoT Applications in Renewable Energy and its Control (PARC).

[6]  Shaohua Wan,et al.  Gesture image segmentation with Otsu’s method based on noise adaptive angle threshold , 2020, Multimedia Tools and Applications.

[7]  Dmitrij Šešok,et al.  A Systematic Literature Review on Image Captioning , 2019, Applied Sciences.

[8]  K. Anitha Kumari,et al.  Automated Image Captioning for Flickr8K Dataset , 2019, AISGSC 2019.

[9]  Md. Zakir Hossain,et al.  A Comprehensive Survey of Deep Learning for Image Captioning , 2018, ACM Comput. Surv..

[10]  Qingyang Xu,et al.  A survey on deep neural network-based image captioning , 2018, The Visual Computer.

[11]  Lei Zhang,et al.  Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[12]  Sabrina Jingya Teo Automated image captioning , 2017 .

[13]  Basura Fernando,et al.  SPICE: Semantic Propositional Image Caption Evaluation , 2016, ECCV.

[14]  Ahmet Aker,et al.  Generating Image Descriptions Using Dependency Relational Patterns , 2010, ACL.