Cross-modal multi-label image classification modeling and recognition based on nonlinear

Abstract Recently, it has become a popular strategy in multi-label image recognition to predict those labels that co-occur in a picture. Previous work has concentrated on capturing label correlation but has neglected to correctly fuse picture features and label embeddings, which has a substantial influence on the model’s convergence efficiency and restricts future multi-label image recognition accuracy improvement. In order to better classify labeled training samples of corresponding categories in the field of image classification, a cross-modal multi-label image classification modeling and recognition method based on nonlinear is proposed. Multi-label classification models based on deep convolutional neural networks are constructed respectively. The visual classification model uses natural images and simple biomedical images with single labels to achieve heterogeneous transfer learning and homogeneous transfer learning, capturing the general features of the general field and the proprietary features of the biomedical field, while the text classification model uses the description text of simple biomedical images to achieve homogeneous transfer learning. The experimental results show that the multi-label classification model combining the two modes can obtain a hamming loss similar to the best performance of the evaluation task, and the macro average F1 value increases from 0.20 to 0.488, which is about 52.5% higher. The cross-modal multi-label image classification algorithm can better alleviate the problem of overfitting in most classes and has better cross-modal retrieval performance. In addition, the effectiveness and rationality of the two cross-modal mapping techniques are verified.

[1]  Peiying Zhang,et al.  MS2GAH: Multi-label semantic supervised graph attention hashing for robust cross-modal retrieval , 2022, Pattern Recognit..

[2]  Dimitrios Alexios Karras,et al.  A Global Optimization Algorithm for Intelligent Electromechanical Control System with Improved Filling Function , 2022, Scientific Programming.

[3]  Long Wang,et al.  Analysis of sports video using image recognition of sportsmen , 2022, International Journal of System Assurance Engineering and Management.

[4]  Nilesh M. Shelke,et al.  A Novel Approach to Classifying Breast Cancer Histopathology Biopsy Images Using Bilateral Knowledge Distillation and Label Smoothing Regularization , 2021, Computational and mathematical methods in medicine.

[5]  Mohammad Shabaz,et al.  Image Fusion Algorithm at Pixel Level Based on Edge Detection , 2021, Journal of healthcare engineering.

[6]  Mohammad Shabaz,et al.  A New Face Image Recognition Algorithm Based on Cerebellum-Basal Ganglia Mechanism , 2021, Journal of healthcare engineering.

[7]  Li Li,et al.  Controlling messy errors in virtual reconstruction of random sports image capture points for complex systems , 2021 .

[8]  Xinyi Ning,et al.  Research on Multimodal Emotion Analysis Algorithm Based on Deep Learning , 2021 .

[9]  Korhan Cengiz,et al.  Application of clustering algorithm in complex landscape farmland synthetic aperture radar image segmentation , 2021, J. Intell. Syst..

[10]  Nathan W. Gouwens,et al.  Consistent cross-modal identification of cortical neurons with coupled autoencoders , 2020, Nature Computational Science.

[11]  Haijun Zhang,et al.  Clothing generation by multi-modal embedding: A compatibility matrix-regularized GAN model , 2021, Image Vis. Comput..

[12]  Ke Zhou,et al.  Fast Graph Convolution Network Based Multi-label Image Recognition via Cross-modal Fusion , 2020, CIKM.

[13]  Xiaopeng Hong,et al.  Infrared-Visible Cross-Modal Person Re-Identification with an X Modality , 2020, AAAI.

[14]  Xinyu Zhang,et al.  Unsupervised domain adaption for image-to-video person re-identification , 2020, Multim. Tools Appl..

[15]  Jin Gu,et al.  Cross-modal representations in early visual and auditory cortices revealed by multi-voxel pattern analysis , 2019, Brain Imaging and Behavior.

[16]  Mahesh G. Huddar,et al.  Multi-level context extraction and attention-based contextual inter-modal fusion for multimodal sentiment analysis and emotion classification , 2019, International Journal of Multimedia Information Retrieval.

[17]  A. Schaeffer,et al.  Formation, classification and identification of non-extractable residues of 14C-labelled ionic compounds in soil. , 2019, Chemosphere.

[18]  Kezhi Mao,et al.  Task-generic semantic convolutional neural network for web text-aided image classification , 2019, Neurocomputing.