Multimodal deep learning based on multiple correspondence analysis for disaster management

The fast and explosive growth of digital data in social media and World Wide Web has led to numerous opportunities and research activities in multimedia big data. Among them, disaster management applications have attracted a lot of attention in recent years due to its impacts on society and government. This study targets content analysis and mining for disaster management. Specifically, a multimedia big data framework based on the advanced deep learning techniques is proposed. First, a video dataset of natural disasters is collected from YouTube. Then, two separate deep networks including a temporal audio model and a spatio-temporal visual model are presented to analyze the audio-visual modalities in video clips effectively. Thereafter, the results of both models are integrated using the proposed fusion model based on the Multiple Correspondence Analysis (MCA) algorithm which considers the correlations between data modalities and final classes. The proposed multimodal framework is evaluated on the collected disaster dataset and compared with several state-of-the-art single modality and fusion techniques. The results demonstrate the effectiveness of both visual model and fusion model compared to the baseline approaches. Specifically, the accuracy of the final multi-class classification using the proposed MCA-based fusion reaches to 73% on this challenging dataset.

[1]  Fengxi Song,et al.  Feature Selection Using Principal Component Analysis , 2010, 2010 International Conference on System Science, Engineering Design and Manufacturing Informatization.

[2]  Michael Weber,et al.  Autonomous driving: investigating the feasibility of car-driver handover assistance , 2015, AutomotiveUI.

[3]  Shu-Ching Chen,et al.  Effective supervised discretization for classification based on correlation maximization , 2011, 2011 IEEE International Conference on Information Reuse & Integration.

[4]  Shu-Ching Chen,et al.  Feature Selection Using Correlation and Reliability Based Scoring Metric for Video Semantic Detection , 2010, 2010 IEEE Fourth International Conference on Semantic Computing.

[5]  Juhan Nam,et al.  Multimodal Deep Learning , 2011, ICML.

[6]  Gilbert Saporta,et al.  Data fusion and data grafting , 2002 .

[7]  M. Greenacre,et al.  Multiple Correspondence Analysis and Related Methods , 2006 .

[8]  Stuart Harvey Rubin,et al.  A Human-Centered Multiple Instance Learning Framework for Semantic Video Retrieval , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[9]  Erik Cambria,et al.  Towards an intelligent framework for multimodal affective data analysis , 2015, Neural Networks.

[10]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[11]  Chengcui Zhang,et al.  Guest Editors' Introduction: Intelligent and Pervasive Multimedia Systems , 2009, IEEE Multim..

[12]  Shu-Ching Chen,et al.  Semantic Concept Detection Using Weighted Discretization Multiple Correspondence Analysis for Disaster Information Management , 2016, 2016 IEEE 17th International Conference on Information Reuse and Integration (IRI).

[13]  Ngai-Man Cheung,et al.  Truly Multi-modal YouTube-8M Video Classification with Video, Audio, and Text , 2017, ArXiv.

[14]  Xianhui Che,et al.  A Survey of Current YouTube Video Characteristics , 2015, IEEE MultiMedia.

[15]  Alan F. Smeaton,et al.  A Comparison of Score, Rank and Probability-Based Fusion Methods for Video Shot Retrieval , 2005, CIVR.

[16]  Alberto Del Bimbo,et al.  A multimodal feature learning approach for sentiment analysis of social network multimedia , 2016, Multimedia Tools and Applications.

[17]  Tara N. Sainath,et al.  FUNDAMENTAL TECHNOLOGIES IN MODERN SPEECH RECOGNITION Digital Object Identifier 10.1109/MSP.2012.2205597 , 2012 .

[18]  Stuart Harvey Rubin,et al.  FA-MCADF: Feature Affinity Based Multiple Correspondence Analysis and Decision Fusion Framework for Disaster Information Management , 2017, 2017 IEEE International Conference on Information Reuse and Integration (IRI).

[19]  Björn W. Schuller,et al.  Context-sensitive multimodal emotion recognition from speech and facial expression using bidirectional LSTM modeling , 2010, INTERSPEECH.

[20]  Rangasami L. Kashyap,et al.  Generalized Affinity-Based Association Rule Mining for Multimedia Database Queries , 2001, Knowledge and Information Systems.

[21]  Mei-Ling Shyu,et al.  Leveraging Concept Association Network for Multimedia Rare Concept Mining and Retrieval , 2012, 2012 IEEE International Conference on Multimedia and Expo.

[22]  M. Careem,et al.  Sahana: Overview of a Disaster Management System , 2006, 2006 International Conference on Information and Automation.

[23]  Antonio Torralba,et al.  Anticipating Visual Representations from Unlabeled Video , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Shu-Ching Chen,et al.  Emerging Multimedia Research and Applications , 2015, IEEE Multim..

[25]  Marian Stewart Bartlett,et al.  Face recognition by independent component analysis , 2002, IEEE Trans. Neural Networks.

[26]  Marcelo M. Wanderley,et al.  Enriched Multimodal Representations of Music Performances: Online Access and Visualization , 2017, IEEE MultiMedia.

[27]  Dave Yates,et al.  Emergency knowledge management and social media technologies: A case study of the 2010 Haitian earthquake , 2011, Int. J. Inf. Manag..

[28]  Chengcui Zhang,et al.  Innovative Shot Boundary Detection for Video Indexing , 2005 .

[29]  Min Chen,et al.  IF-MCA: Importance Factor-Based Multiple Correspondence Analysis for Multimedia Data Analytics , 2018, IEEE Transactions on Multimedia.

[30]  Marie Chavent,et al.  Handling Missing Values with Regularized Iterative Multiple Correspondence Analysis , 2011, Journal of Classification.

[31]  John R. Smith Riding the multimedia big data wave , 2013, SIGIR.

[32]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Shu-Ching Chen,et al.  Automatic Video Event Detection for Imbalance Data Using Enhanced Ensemble Deep Learning , 2017, Int. J. Semantic Comput..

[34]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[35]  Hao Chen,et al.  DeepFood: Automatic Multi-Class Classification of Food Ingredients Using Deep Learning , 2017, 2017 IEEE 3rd International Conference on Collaboration and Internet Computing (CIC).

[36]  Alexei A. Efros,et al.  What makes ImageNet good for transfer learning? , 2016, ArXiv.

[37]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[38]  Chengcui Zhang,et al.  Learning-based spatio-temporal vehicle tracking and indexing for transportation multimedia database systems , 2003, IEEE Trans. Intell. Transp. Syst..

[39]  Keke Gai,et al.  Intercrossed Access Controls for Secure Financial Services on Multimedia Big Data in Cloud Systems , 2016, ACM Trans. Multim. Comput. Commun. Appl..

[40]  Mei-Ling Shyu,et al.  Handling nominal features in anomaly intrusion detection problems , 2005, 15th International Workshop on Research Issues in Data Engineering: Stream Data Mining and Applications (RIDE-SDMA'05).

[41]  Alberto Del Bimbo,et al.  Multimedia Big Data , 2015, IEEE Multim..

[42]  Min Chen,et al.  Semantic event detection via multimodal data mining , 2006, IEEE Signal Processing Magazine.

[43]  Antonio Torralba,et al.  SoundNet: Learning Sound Representations from Unlabeled Video , 2016, NIPS.

[44]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[45]  Xian-Sheng Hua,et al.  Bridging low-level features and high-level semantics via fMRI brain imaging for video classification , 2010, ACM Multimedia.

[46]  Shu-Ching Chen,et al.  Computational Health Informatics in the Big Data Age , 2016, ACM Comput. Surv..

[47]  Mei-Ling Shyu,et al.  Weighted Association Rule Mining for Video Semantic Detection , 2010, Int. J. Multim. Data Eng. Manag..

[48]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49]  Zhigang Deng,et al.  Analysis of emotion recognition using facial expressions, speech and multimodal information , 2004, ICMI '04.

[50]  Christopher Joseph Pal,et al.  Activity recognition using the velocity histories of tracked keypoints , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[51]  Loïc Kessous,et al.  Multimodal emotion recognition in speech-based interaction using facial expression, body gesture and acoustic analysis , 2010, Journal on Multimodal User Interfaces.

[52]  Edward Y. Chang,et al.  Optimal multimodal fusion for multimedia data analysis , 2004, MULTIMEDIA '04.

[53]  E. Roche Place to Space: Migrating to eBusiness Models , 2001 .

[54]  Chao Chen,et al.  Web media semantic concept retrieval via tag removal and model fusion , 2013, ACM Trans. Intell. Syst. Technol..

[55]  Christian Jutten,et al.  Multimodal Data Fusion: An Overview of Methods, Challenges, and Prospects , 2015, Proceedings of the IEEE.

[56]  Thomas Kirste,et al.  Design Challenges for an Integrated Disaster Management Communication and Information System , 2002 .

[57]  Gang Hua,et al.  Multimedia Big Data Computing , 2015, IEEE Multim..

[58]  Patrick J. Flynn,et al.  An evaluation of multimodal 2D+3D face biometrics , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[59]  Mei-Ling Shyu,et al.  Effective Feature Space Reduction with Imbalanced Data for Semantic Concept Detection , 2008, 2008 IEEE International Conference on Sensor Networks, Ubiquitous, and Trustworthy Computing (sutc 2008).

[60]  S. Sitharama Iyengar,et al.  Data-Driven Techniques in Disaster Information Management , 2017, ACM Comput. Surv..

[61]  Suet-Peng Yong,et al.  A comparison of deep learning and hand crafted features in medical image modality classification , 2016, 2016 3rd International Conference on Computer and Information Sciences (ICCOINS).

[62]  Christopher Joseph Pal,et al.  EmoNets: Multimodal deep learning approaches for emotion recognition in video , 2015, Journal on Multimodal User Interfaces.

[63]  Shu-Ching Chen,et al.  Deep Spatio-Temporal Representation Learning for Multi-Class Imbalanced Data Classification , 2018, 2018 IEEE International Conference on Information Reuse and Integration (IRI).

[64]  Shu-Ching Chen,et al.  Multimedia Big Data Analytics , 2018, ACM Comput. Surv..

[65]  Rangasami L. Kashyap,et al.  Augmented Transition Network as a Semantic Model for Video Data , 2001 .

[66]  Shu-Ching Chen,et al.  MCA-NN: Multiple Correspondence Analysis Based Neural Network for Disaster Information Detection , 2017, 2017 IEEE Third International Conference on Multimedia Big Data (BigMM).

[67]  Chao Chen,et al.  Weighted Subspace Filtering and Ranking Algorithms for Video Concept Retrieval , 2011, IEEE MultiMedia.