Deep Learning for Imbalanced Multimedia Data Classification

Classification of imbalanced data is an important research problem as lots of real-world data sets have skewed class distributions in which the majority of data instances (examples) belong to one class and far fewer instances belong to others. While in many applications, the minority instances actually represent the concept of interest (e.g., fraud in banking operations, abnormal cell in medical data, etc.), a classifier induced from an imbalanced data set is more likely to be biased towards the majority class and show very poor classification accuracy on the minority class. Despite extensive research efforts, imbalanced data classification remains one of the most challenging problems in data mining and machine learning, especially for multimedia data. To tackle this challenge, in this paper, we propose an extended deep learning approach to achieve promising performance in classifying skewed multimedia data sets. Specifically, we investigate the integration of bootstrapping methods and a state-of-the-art deep learning approach, Convolutional Neural Networks (CNNs), with extensive empirical studies. Considering the fact that deep learning approaches such as CNNs are usually computationally expensive, we propose to feed low-level features to CNNs and prove its feasibility in achieving promising performance while saving a lot of training time. The experimental results show the effectiveness of our framework in classifying severely imbalanced data in the TRECVID data set.

[1]  E. Kandel An introduction to the work of David Hubel and Torsten Wiesel , 2009, The Journal of physiology.

[2]  Jun-Wei Hsieh,et al.  Modeling and recognizing action contexts in persons using sparse representation , 2015, J. Vis. Commun. Image Represent..

[3]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Chao Chen,et al.  Weighted Subspace Filtering and Ranking Algorithms for Video Concept Retrieval , 2011, IEEE MultiMedia.

[5]  Ji Wan,et al.  Deep Learning for Content-Based Image Retrieval: A Comprehensive Study , 2014, ACM Multimedia.

[6]  Jun-Wei Hsieh,et al.  PLSA-Based Sparse Representation for Object Classification , 2014, 2014 22nd International Conference on Pattern Recognition.

[7]  Zhihua Cai,et al.  Evaluation Measures of the Classification Performance of Imbalanced Data Sets , 2009 .

[8]  Shu-Ching Chen,et al.  Video Semantic Concept Discovery using Multimodal-Based Association Classification , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[9]  Tansel Özyer,et al.  Information Reuse and Integration in Academia and Industry , 2013, Springer Vienna.

[10]  Shamik Sural,et al.  Segmentation and histogram generation using the HSV color space for image retrieval , 2002, Proceedings. International Conference on Image Processing.

[11]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Yiannis S. Boutalis,et al.  CEDD: Color and Edge Directivity Descriptor: A Compact Descriptor for Image Indexing and Retrieval , 2008, ICVS.

[13]  Chao Chen,et al.  Clustering-based binary-class classification for imbalanced data sets , 2011, 2011 IEEE International Conference on Information Reuse & Integration.

[14]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[15]  Choochart Haruechaiyasak,et al.  Category cluster discovery from distributed WWW directories , 2003, Inf. Sci..

[16]  Paul Over,et al.  Evaluation campaigns and TRECVid , 2006, MIR '06.

[17]  Mei-Ling Shyu,et al.  Integration of Semantics Information and Clustering in Binary-Class Classification for Handling Imbalanced Multimedia Data , 2013 .

[18]  Jake Bouvrie,et al.  Notes on Convolutional Neural Networks , 2006 .

[19]  Haojie Li,et al.  TRECVid 2013 Semantic Video Concept Detection by NTT-MD-DUT , 2013, TRECVID.

[20]  C. P. Unsworth,et al.  Excessive Noise Injection Training of Neural Networks for Markerless Tracking in Obscured and Segmented Environments , 2006, Neural Computation.

[21]  Koichi Shinoda,et al.  TokyoTech+Canon at TRECVID 2011 , 2011, TRECVID.

[22]  Marcel Worring,et al.  Bootstrapping Visual Categorization With Relevant Negatives , 2013, IEEE Transactions on Multimedia.

[23]  Bharti,et al.  An efficient approach for Color Image Retrieval using Haar wavelet , 2009, 2009 Proceeding of International Conference on Methods and Models in Computer Science (ICM2CS).

[24]  Steve Renals,et al.  Convolutional Neural Networks for Distant Speech Recognition , 2014, IEEE Signal Processing Letters.

[25]  Koichi Shinoda,et al.  A Fast and Accurate Video Semantic-Indexing System Using Fast MAP Adaptation and GMM Supervectors , 2012, IEEE Transactions on Multimedia.

[26]  Li Deng,et al.  A tutorial survey of architectures, algorithms, and applications for deep learning , 2014, APSIPA Transactions on Signal and Information Processing.

[27]  Min Chen,et al.  Utilizing concept correlations for effective imbalanced data classification , 2014, Proceedings of the 2014 IEEE 15th International Conference on Information Reuse and Integration (IEEE IRI 2014).

[28]  Li Zhang,et al.  A Re-sampling Method for Class Imbalance Learning with Credit Data , 2011, 2011 International Conference of Information Technology, Computer Engineering and Management Sciences.

[29]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[30]  Cees G. M. Snoek,et al.  The MediaMill at TRECVID 2013: : Searching concepts, Objects, Instances and events in video , 2013, TRECVID.

[31]  Shiming Xiang,et al.  Vehicle Detection in Satellite Images by Hybrid Deep Convolutional Neural Networks , 2014, IEEE Geoscience and Remote Sensing Letters.

[32]  Changshui Zhang,et al.  Traffic Sign Recognition With Hinge Loss Trained Convolutional Neural Networks , 2014, IEEE Transactions on Intelligent Transportation Systems.

[33]  Yongzhao Zhan,et al.  Learning Salient Features for Speech Emotion Recognition Using Convolutional Neural Networks , 2014, IEEE Transactions on Multimedia.

[34]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.

[35]  Zhi-Hua Zhou,et al.  Exploratory Undersampling for Class-Imbalance Learning , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[36]  Min Chen,et al.  Spatio-Temporal Analysis for Human Action Detection and Recognition in Uncontrolled Environments , 2015, Int. J. Multim. Data Eng. Manag..

[37]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Min Chen,et al.  Image database retrieval utilizing affinity relationships , 2003, MMDB '03.