Heterogeneous visual features integration for image recognition optimization in internet of things

Abstract Recently, a large number of physical devices, together with distributed information systems, deployed in internet of things (IoT), are collecting more and more images. Such collected images recognition poses an important challenge on optimization in internet of things. Specially, most of existing methods only adopt shallow learning models to integrate various features of images for recognition limiting classification accuracy. In this paper, we propose a multimodal deep learning (MMDL) approach to integrate heterogeneous visual features by considering each type of visual feature as one modality for image recognition optimization in internet of things. In our scheme, we extract the high-level abstraction of each modality by a stacked autoencoders. Furthermore, we design a back propagation algorithm with shared weights learned from a softmax layer to update the pretrained parameters of multiple stacked autoencoders simultaneously. The integration is performed by concatenating the last hidden layers of the multimodal stacked autoencoders architecture. Extensive experiments are carried out on three datasets i.e. Animal with Attributes, NUS-WIDE-OBJECT, and Handwritten Numerals, by comparison with SVM, SAE, and AMMSS. Results demonstrate that our scheme has superior performance on heterogeneous visual features integration for image recognition optimization in internet of things.

[1]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Chunyan Miao,et al.  Online multimodal deep similarity learning with application to image retrieval , 2013, ACM Multimedia.

[3]  Christoph Meinel,et al.  Deep Semantic Mapping for Cross-Modal Retrieval , 2015, 2015 IEEE 27th International Conference on Tools with Artificial Intelligence (ICTAI).

[4]  Feng Xia,et al.  Social-Oriented Resource Management in Cloud-Based Mobile Networks , 2016, IEEE Cloud Computing.

[5]  Nitish Srivastava,et al.  Multimodal learning with deep Boltzmann machines , 2012, J. Mach. Learn. Res..

[6]  Michael S. Lew,et al.  Deep learning for visual understanding: A review , 2016, Neurocomputing.

[7]  Feiping Nie,et al.  Heterogeneous Visual Features Fusion via Sparse Multimodal Machine , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Fei-Yue Wang,et al.  Traffic Flow Prediction With Big Data: A Deep Learning Approach , 2015, IEEE Transactions on Intelligent Transportation Systems.

[9]  Song Bai,et al.  Deep learning representation using autoencoder for 3D shape retrieval , 2014, SPAC.

[10]  Li Zhang,et al.  Semi-Supervised Image Classification by Nonnegative Sparse Neighborhood Propagation , 2015, ICMR.

[11]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .

[12]  Jürgen Schmidhuber,et al.  Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Victor Lavrenko,et al.  Sparse Kernel Learning for Image Annotation , 2014, ICMR.

[14]  Guoyang Chen,et al.  Coherence-Free Multiview: Enabling Reference-Discerning Data Placement on GPU , 2016, ICS.

[15]  Qi Tian,et al.  Image Classification and Retrieval are ONE , 2015, ICMR.

[16]  Christoph H. Lampert,et al.  Learning to detect unseen object classes by between-class attribute transfer , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Laurence T. Yang,et al.  Deep Computation Model for Unsupervised Feature Learning on Big Data , 2016, IEEE Transactions on Services Computing.

[18]  Xiangrong Zhang,et al.  A nonlinear subspace multiple kernel learning for financial distress prediction of Chinese listed companies , 2016, Neurocomputing.

[19]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[20]  Christopher Joseph Pal,et al.  EmoNets: Multimodal deep learning approaches for emotion recognition in video , 2015, Journal on Multimodal User Interfaces.

[21]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[22]  B. S. Manjunath,et al.  Multi-Label Learning With Fused Multimodal Bi-Relational Graph , 2014, IEEE Transactions on Multimedia.

[23]  Juhan Nam,et al.  Multimodal Deep Learning , 2011, ICML.

[24]  Awais Ahmad,et al.  Urban planning and building smart cities based on the Internet of Things using Big Data analytics , 2016, Comput. Networks.

[25]  Chaofeng Li,et al.  Learning multi-kernel multi-view canonical correlations for image recognition , 2016, Computational Visual Media.

[26]  Yueming Hu,et al.  Framework of integrated big data: A review , 2016, 2016 IEEE International Conference on Big Data Analysis (ICBDA).

[27]  Houbing Song,et al.  Internet of Things and Big Data Analytics for Smart and Connected Communities , 2016, IEEE Access.

[28]  Changsheng Xu,et al.  Cross-Domain Feature Learning in Multimedia , 2015, IEEE Transactions on Multimedia.

[29]  Rong Jin,et al.  Multiple Kernel Learning for Visual Object Recognition: A Review , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Sidong Liu,et al.  Multimodal Neuroimaging Feature Learning for Multiclass Diagnosis of Alzheimer's Disease , 2015, IEEE Transactions on Biomedical Engineering.

[31]  Jian Wang,et al.  Image-Text Cross-Modal Retrieval via Modality-Specific Feature Learning , 2015, ICMR.

[32]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[33]  Feiping Nie,et al.  Heterogeneous Image Features Integration via Multi-modal Semi-supervised Learning Model , 2013, 2013 IEEE International Conference on Computer Vision.

[34]  Xue-wen Chen,et al.  Big Data Deep Learning: Challenges and Perspectives , 2014, IEEE Access.

[35]  Feiping Nie,et al.  Heterogeneous image feature integration via multi-modal spectral clustering , 2011, CVPR 2011.