A Multi-label Multimodal Deep Learning Framework for Imbalanced Data Classification

Social media and Web services have provided a notable number of multimedia content. Due to such explosion of multimedia data, the multimedia community has been facing new challenges and exciting opportunities these days. This paper presents a new multimedia framework to address some of the main challenges in this area. In particular, it presents a multi-label multimodal framework for imbalanced data classification. For this purpose, it utilizes audio, visual, and textual data modalities and automatically generates static and temporal features using spatio-temporal deep neural networks. It also manages data with non-uniform distributions using a weighted multi-label classifier. To evaluate this framework, a video dataset containing natural disasters is used for multi-label classification. The supremacy of the proposed framework compared to the existing work is revealed with extensive experiments on this dataset.

[1]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Min-Ling Zhang,et al.  A Review on Multi-Label Learning Algorithms , 2014, IEEE Transactions on Knowledge and Data Engineering.

[3]  C. V. Jawahar,et al.  Multi-label Cross-Modal Retrieval , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[4]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[5]  Shaogang Gong,et al.  Imbalanced Deep Learning by Minority Class Incremental Rectification , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[7]  Yin Li,et al.  Learning Deep Structure-Preserving Image-Text Embeddings , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Grigorios Tsoumakas,et al.  Multi-Label Classification of Music into Emotions , 2008, ISMIR.

[9]  Yuan Jiang,et al.  Complex Object Classification: A Multi-Modal Multi-Instance Multi-Label Deep Network with Optimal Transport , 2018, KDD.

[10]  Fernando Bação,et al.  Effective data generation for imbalanced learning using conditional generative adversarial networks , 2018, Expert Syst. Appl..

[11]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Mehrdad Nourani,et al.  Predicting Drug-Target Interaction Using Deep Matrix Factorization , 2018, 2018 IEEE Biomedical Circuits and Systems Conference (BioCAS).

[13]  Shu-Ching Chen,et al.  Dynamic Sampling in Convolutional Neural Networks for Imbalanced Data Classification , 2018, 2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR).

[14]  Shu-Ching Chen,et al.  Deep Spatio-Temporal Representation Learning for Multi-Class Imbalanced Data Classification , 2018, 2018 IEEE International Conference on Information Reuse and Integration (IRI).

[15]  Shu-Ching Chen,et al.  Multimodal deep representation learning for video classification , 2018, World Wide Web.

[16]  Liang Wang,et al.  Unconstrained Multimodal Multi-Label Learning , 2015, IEEE Transactions on Multimedia.

[17]  Mehrdad Nourani,et al.  Feature Selection to Predict Compound's Effect on Aging , 2018, BCB.

[18]  Antonio Torralba,et al.  SoundNet: Learning Sound Representations from Unlabeled Video , 2016, NIPS.

[19]  Seong-Whan Lee,et al.  Hierarchical feature representation and multimodal fusion with deep learning for AD/MCI diagnosis , 2014, NeuroImage.

[20]  Lars Schmidt-Thieme,et al.  Cost-sensitive learning methods for imbalanced data , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[21]  Grigorios Tsoumakas,et al.  Random K-labelsets for Multilabel Classification , 2022 .

[22]  Mei-Ling Shyu,et al.  Multimodal deep learning based on multiple correspondence analysis for disaster management , 2018, World Wide Web.

[23]  Shu-Ching Chen,et al.  Enhancing Multimedia Imbalanced Concept Detection Using VIMP in Random Forests , 2016, 2016 IEEE 17th International Conference on Information Reuse and Integration (IRI).

[24]  Zhi-Hua Zhou,et al.  ML-KNN: A lazy learning approach to multi-label learning , 2007, Pattern Recognit..