Efficient Imbalanced Multimedia Concept Retrieval by Deep Learning on Spark Clusters

The classification of imbalanced datasets has recently attracted significant attention due to its implications in several real-world use cases. The classifiers developed on datasets with skewed distributions tend to favor the majority classes and are biased against the minority class. Despite extensive research interests, imbalanced data classification remains a challenge in data mining research, especially for multimedia data. Our attempt to overcome this hurdle is to develop a convolutional neural network CNN based deep learning solution integrated with a bootstrapping technique. Considering that convolutional neural networks are very computationally expensive coupled with big training datasets, we propose to extract features from pre-trained convolutional neural network models and feed those features to another full connected neutral network. Spark implementation shows promising performance of our model in handling big datasets with respect to feasibility and scalability.

[1]  Zhihua Cai,et al.  Evaluation Measures of the Classification Performance of Imbalanced Data Sets , 2009 .

[2]  Shu-Ching Chen,et al.  Video Semantic Concept Discovery using Multimodal-Based Association Classification , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[3]  Koichi Shinoda,et al.  TokyoTech+Canon at TRECVID 2011 , 2011, TRECVID.

[4]  Shiming Xiang,et al.  Vehicle Detection in Satellite Images by Hybrid Deep Convolutional Neural Networks , 2014, IEEE Geoscience and Remote Sensing Letters.

[5]  Mei-Ling Shyu,et al.  Negative Correlation Discovery for Big Multimedia Data Semantic Concept Mining and Retrieval , 2016, 2016 IEEE Tenth International Conference on Semantic Computing (ICSC).

[6]  Mei-Ling Shyu,et al.  Integration of Semantics Information and Clustering in Binary-Class Classification for Handling Imbalanced Multimedia Data , 2013 .

[7]  Changshui Zhang,et al.  Traffic Sign Recognition With Hinge Loss Trained Convolutional Neural Networks , 2014, IEEE Transactions on Intelligent Transportation Systems.

[8]  Steve Renals,et al.  Convolutional Neural Networks for Distant Speech Recognition , 2014, IEEE Signal Processing Letters.

[9]  Koichi Shinoda,et al.  A Fast and Accurate Video Semantic-Indexing System Using Fast MAP Adaptation and GMM Supervectors , 2012, IEEE Transactions on Multimedia.

[10]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Ji Wan,et al.  Deep Learning for Content-Based Image Retrieval: A Comprehensive Study , 2014, ACM Multimedia.

[12]  Georges Quénot,et al.  TRECVID 2015 - An Overview of the Goals, Tasks, Data, Evaluation Mechanisms and Metrics , 2011, TRECVID.

[13]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[14]  Zhi-Hua Zhou,et al.  Exploratory Undersampling for Class-Imbalance Learning , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[15]  Marcelo Bernardes Vieira,et al.  Combining gradient histograms using orientation tensors for human action recognition , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[16]  Min Chen,et al.  Spatio-Temporal Analysis for Human Action Detection and Recognition in Uncontrolled Environments , 2015, Int. J. Multim. Data Eng. Manag..

[17]  Shamik Sural,et al.  Segmentation and histogram generation using the HSV color space for image retrieval , 2002, Proceedings. International Conference on Image Processing.

[18]  Chao Chen,et al.  Clustering-based binary-class classification for imbalanced data sets , 2011, 2011 IEEE International Conference on Information Reuse & Integration.

[19]  Yang Liu,et al.  Enhancing Multimedia Semantic Concept Mining and Retrieval by Incorporating Negative Correlations , 2014, 2014 IEEE International Conference on Semantic Computing.

[20]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Min Chen,et al.  Image database retrieval utilizing affinity relationships , 2003, MMDB '03.

[22]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[23]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[24]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[25]  Cees G. M. Snoek,et al.  The MediaMill at TRECVID 2013: : Searching concepts, Objects, Instances and events in video , 2013, TRECVID.

[26]  E. Kandel An introduction to the work of David Hubel and Torsten Wiesel , 2009, The Journal of physiology.

[27]  Marcelo Bernardes Vieira,et al.  A tensor motion descriptor based on histograms of gradients and optical flow , 2014, Pattern Recognit. Lett..

[28]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  D. Signorini,et al.  Neural networks , 1995, The Lancet.

[30]  Jiebo Luo,et al.  Recognizing realistic actions from videos “in the wild” , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  A. Smeaton,et al.  TRECVID 2013 -- An Overview of the Goals, Tasks, Data, Evaluation Mechanisms, and Metrics | NIST , 2011 .

[32]  Reynold Xin,et al.  Apache Spark , 2016 .

[33]  Haojie Li,et al.  TRECVid 2013 Semantic Video Concept Detection by NTT-MD-DUT , 2013, TRECVID.

[34]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[35]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Yiannis S. Boutalis,et al.  CEDD: Color and Edge Directivity Descriptor: A Compact Descriptor for Image Indexing and Retrieval , 2008, ICVS.

[37]  Shu-Ching Chen,et al.  A Classifier Ensemble Framework for Multimedia Big Data Classification , 2016, 2016 IEEE 17th International Conference on Information Reuse and Integration (IRI).

[38]  Min Chen,et al.  Deep Learning for Imbalanced Multimedia Data Classification , 2015, 2015 IEEE International Symposium on Multimedia (ISM).

[39]  Yongzhao Zhan,et al.  Learning Salient Features for Speech Emotion Recognition Using Convolutional Neural Networks , 2014, IEEE Transactions on Multimedia.

[40]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.

[41]  Mubarak Shah,et al.  Learning semantic visual vocabularies using diffusion distance , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  Min Chen,et al.  Utilizing concept correlations for effective imbalanced data classification , 2014, Proceedings of the 2014 IEEE 15th International Conference on Information Reuse and Integration (IEEE IRI 2014).

[43]  Li Zhang,et al.  A Re-sampling Method for Class Imbalance Learning with Credit Data , 2011, 2011 International Conference of Information Technology, Computer Engineering and Management Sciences.

[44]  Sanjay Ghemawat,et al.  MapReduce: simplified data processing on large clusters , 2008, CACM.

[45]  Sheng Guan,et al.  Domain Knowledge Assisted Data Processing for Florida Public Hurricane Loss Model (Invited Paper) , 2016, 2016 IEEE 17th International Conference on Information Reuse and Integration (IRI).

[46]  Fredric C. Gey,et al.  The Relationship between Recall and Precision , 1994, J. Am. Soc. Inf. Sci..

[47]  Bharti,et al.  An efficient approach for Color Image Retrieval using Haar wavelet , 2009, 2009 Proceeding of International Conference on Methods and Models in Computer Science (ICM2CS).

[48]  Jake Bouvrie,et al.  Notes on Convolutional Neural Networks , 2006 .

[49]  Paul Over,et al.  Evaluation campaigns and TRECVid , 2006, MIR '06.

[50]  C. P. Unsworth,et al.  Excessive Noise Injection Training of Neural Networks for Markerless Tracking in Obscured and Segmented Environments , 2006, Neural Computation.

[51]  Choochart Haruechaiyasak,et al.  Category cluster discovery from distributed WWW directories , 2003, Inf. Sci..

[52]  Chao Chen,et al.  Weighted Subspace Filtering and Ranking Algorithms for Video Concept Retrieval , 2011, IEEE MultiMedia.

[53]  Jun-Wei Hsieh,et al.  PLSA-Based Sparse Representation for Object Classification , 2014, 2014 22nd International Conference on Pattern Recognition.

[54]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[55]  Jonathan G. Fiscus,et al.  TRECVID 2016: Evaluating Video Search, Video Event Detection, Localization, and Hyperlinking , 2016, TRECVID.

[56]  Mei-Ling Shyu,et al.  Supporting Semantic Concept Retrieval with Negative Correlations in a Multimedia Big Data Mining System , 2016, Int. J. Semantic Comput..

[57]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..