Semantic Event Detection Using Ensemble Deep Learning

Numerous deep learning architectures have been designed for a variety of tasks in the past few years. However, it is almost impossible for one model to work well for all kinds of scenarios and datasets. Therefore, we present an ensemble deep learning framework in this paper, which not only decreases the information loss and over-fitting problems caused by single models, but also overcomes the imbalanced data issue in multimedia big data. First, a suite of deep learning algorithms are utilized for deep feature selection. Thereafter, an enhanced ensemble algorithm is developed based on the performance of each single Support Vector Machine classifier on each deep feature set. We evaluate our proposed ensemble deep learning framework on a large and highly imbalanced video dataset containing natural disaster events. Experimental results demonstrate the effectiveness of the proposed framework for semantic event detection, and show how it outperforms several state-of-the-art deep learning architectures, as well as handcrafted features integrated with ensemble and non-ensemble algorithms.

[1]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[2]  Yiannis S. Boutalis,et al.  CEDD: Color and Edge Directivity Descriptor: A Compact Descriptor for Image Indexing and Retrieval , 2008, ICVS.

[3]  Min Chen,et al.  Video Semantic Event/Concept Detection Using a Subspace-Based Multimedia Data Mining Framework , 2008, IEEE Transactions on Multimedia.

[4]  Shu-Ching Chen,et al.  Correlation-Based Deep Learning for Multimedia Semantic Concept Detection , 2015, WISE.

[5]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[6]  Stuart Harvey Rubin,et al.  A Human-Centered Multiple Instance Learning Framework for Semantic Video Retrieval , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[7]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[8]  Shu-Ching Chen,et al.  Video Semantic Concept Discovery using Multimodal-Based Association Classification , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[9]  Peter I. Corke,et al.  Content Specific Feature Learning for Fine-Grained Plant Classification , 2015, CLEF.

[10]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Zhimin Gu,et al.  Restraining False Feedbacks in Peer-to-Peer Reputation Systems , 2007 .

[12]  Min Chen,et al.  Deep Learning for Imbalanced Multimedia Data Classification , 2015, 2015 IEEE International Symposium on Multimedia (ISM).

[13]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[14]  Min Chen,et al.  A latent semantic indexing based method for solving multiple instance learning problem in region-based image retrieval , 2005, Seventh IEEE International Symposium on Multimedia (ISM'05).

[15]  Mei-Ling Shyu,et al.  Effective Feature Space Reduction with Imbalanced Data for Semantic Concept Detection , 2008, 2008 IEEE International Conference on Sensor Networks, Ubiquitous, and Trustworthy Computing (sutc 2008).

[16]  Chengcui Zhang,et al.  An intelligent framework for spatio-temporal vehicle tracking , 2001, ITSC 2001. 2001 IEEE Intelligent Transportation Systems. Proceedings (Cat. No.01TH8585).

[17]  Yimin Yang,et al.  Exploring Hidden Coherent Feature Groups and Temporal Semantics for Multimedia Big Data Analysis , 2015 .

[18]  Zhi-Hua Zhou,et al.  Exploratory Undersampling for Class-Imbalance Learning , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[19]  Min Chen,et al.  Spatio-Temporal Analysis for Human Action Detection and Recognition in Uncontrolled Environments , 2015, Int. J. Multim. Data Eng. Manag..

[20]  Jungsoo Kim,et al.  2000 IEEE International Conference On Multimedia And Expo , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[21]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[22]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[23]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[24]  Shu-Ching Chen,et al.  Correlation-based Feature Analysis and Multi-Modality Fusion framework for multimedia semantic retrieval , 2013, 2013 IEEE International Conference on Multimedia and Expo (ICME).

[25]  Chengcui Zhang,et al.  Innovative Shot Boundary Detection for Video Indexing , 2005 .

[26]  Shu-Ching Chen,et al.  Semantic Concept Detection Using Weighted Discretization Multiple Correspondence Analysis for Disaster Information Management , 2016, 2016 IEEE 17th International Conference on Information Reuse and Integration (IRI).

[27]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[28]  Ji Wan,et al.  Deep Learning for Content-Based Image Retrieval: A Comprehensive Study , 2014, ACM Multimedia.

[29]  Shu-Ching Chen,et al.  Ensemble Learning from Imbalanced Data Set for Video Event Detection , 2015, 2015 IEEE International Conference on Information Reuse and Integration.

[30]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Bartosz Krawczyk,et al.  Learning from imbalanced data: open challenges and future directions , 2016, Progress in Artificial Intelligence.

[32]  Chengcui Zhang,et al.  Semantic Event Extraction Using Neural Network Ensembles , 2007 .

[33]  Xiuqi Li,et al.  Image Retrieval By Color , Texture , And Spatial Information , 2002 .