EnsemV3X: a novel ensembled deep learning architecture for multi-label scene classification

Convolutional neural network is widely used to perform the task of image classification, including pretraining, followed by fine-tuning whereby features are adapted to perform the target task, on ImageNet. ImageNet is a large database consisting of 15 million images belonging to 22,000 categories. Images collected from the Web are labeled using Amazon Mechanical Turk crowd-sourcing tool by human labelers. ImageNet is useful for transfer learning because of the sheer volume of its dataset and the number of object classes available. Transfer learning using pretrained models is useful because it helps to build computer vision models in an accurate and inexpensive manner. Models that have been pretrained on substantial datasets are used and repurposed for our requirements. Scene recognition is a widely used application of computer vision in many communities and industries, such as tourism. This study aims to show multilabel scene classification using five architectures, namely, VGG16, VGG19, ResNet50, InceptionV3, and Xception using ImageNet weights available in the Keras library. The performance of different architectures is comprehensively compared in the study. Finally, EnsemV3X is presented in this study. The proposed model with reduced number of parameters is superior to state-of-of-the-art models Inception and Xception because it demonstrates an accuracy of 91%.

[1]  Bhavani M. Thuraisingham,et al.  A Practical Approach to Classify Evolving Data Streams: Training with Limited Amount of Labeled Data , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[2]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[3]  Ck Cheng,et al.  The Age of Big Data , 2015 .

[4]  David Stutz,et al.  Understanding Convolutional Neural Networks , 2014 .

[5]  François Chollet,et al.  Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Ethem Alpaydin,et al.  Support Vector Machines for Multi-class Classification , 1999, IWANN.

[7]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[8]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[9]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[10]  Wei Jiang,et al.  A late fusion approach for harnessing multi-cnn model high-level features , 2017, 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[11]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[12]  Kazushi Ikeda,et al.  ResNet and Batch-normalization Improve Data Separability , 2019, ACML.

[13]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[14]  Quoc V. Le,et al.  Do Better ImageNet Models Transfer Better? , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Luis Herranz,et al.  Scene Recognition with CNNs: Objects, Scales and Dataset Bias , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[17]  L. Deng,et al.  The MNIST Database of Handwritten Digit Images for Machine Learning Research [Best of the Web] , 2012, IEEE Signal Processing Magazine.

[18]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[19]  Yoshua Bengio,et al.  Why Does Unsupervised Pre-training Help Deep Learning? , 2010, AISTATS.

[20]  Tianqi Chen,et al.  Empirical Evaluation of Rectified Activations in Convolutional Network , 2015, ArXiv.

[21]  Marie desJardins,et al.  Evaluation and selection of biases in machine learning , 1995, Machine Learning.

[22]  Anand Nayyar,et al.  Factex: A Practical Approach to Crime Detection , 2020 .

[23]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Mohammed Bennamoun,et al.  A Guide to Convolutional Neural Networks for Computer Vision , 2018, A Guide to Convolutional Neural Networks for Computer Vision.

[25]  Shun-ichi Amari,et al.  A universal theorem on learning curves , 1993, Neural Networks.

[26]  Yun Xu,et al.  On Splitting Training and Validation Set: A Comparative Study of Cross-Validation, Bootstrap and Systematic Sampling for Estimating the Generalization Performance of Supervised Learning , 2018, Journal of Analysis and Testing.

[27]  Guohui Tian,et al.  An Indoor Scene Classification Method for Service Robot Based on CNN Feature , 2019, J. Robotics.

[28]  Alexei A. Efros,et al.  What makes ImageNet good for transfer learning? , 2016, ArXiv.

[29]  Nikhil Ketkar,et al.  Introduction to Keras , 2017 .

[30]  Manuela M. Veloso,et al.  Layered Learning , 2000, ECML.

[31]  Quoc V. Le,et al.  Rethinking Pre-training and Self-training , 2020, NeurIPS.

[32]  Asifullah Khan,et al.  A survey of the recent architectures of deep convolutional neural networks , 2019, Artificial Intelligence Review.

[33]  ResNet on Tiny ImageNet , 2017 .

[34]  Andreas Holzinger,et al.  Augmentor: An Image Augmentation Library for Machine Learning , 2017, J. Open Source Softw..

[35]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[36]  Qiang Yang,et al.  Transfer Learning via Dimensionality Reduction , 2008, AAAI.

[37]  Krista A. Ehinger,et al.  SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[38]  Gavriel Salomon,et al.  T RANSFER OF LEARNING , 1992 .

[39]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[40]  Jun Zhang,et al.  Implementation of Training Convolutional Neural Networks , 2015, ArXiv.

[41]  Sepp Hochreiter,et al.  The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions , 1998, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[42]  Stanford,et al.  Tiny ImageNet Classification with Convolutional Neural Networks , 2015 .

[43]  Ya Le,et al.  Tiny ImageNet Visual Recognition Challenge , 2015 .

[44]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[45]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[46]  Takuya Akiba,et al.  Extremely Large Minibatch SGD: Training ResNet-50 on ImageNet in 15 Minutes , 2017, ArXiv.

[47]  Christiane Fellbaum,et al.  Nouns in WordNet , 1998 .

[48]  Paul Benjamin,et al.  Object Recognition Using Deep Neural Networks: A Survey , 2014, ArXiv.

[49]  Limin Wang,et al.  Places205-VGGNet Models for Scene Recognition , 2015, ArXiv.

[50]  Keiron O'Shea,et al.  An Introduction to Convolutional Neural Networks , 2015, ArXiv.

[51]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[52]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[53]  Snehasis Mukherjee,et al.  Impact of Fully Connected Layers on Performance of Convolutional Neural Networks for Image Classification , 2019, Neurocomputing.

[54]  M. N. Sulaiman,et al.  A Review On Evaluation Metrics For Data Classification Evaluations , 2015 .

[55]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[56]  Taghi M. Khoshgoftaar,et al.  A survey of transfer learning , 2016, Journal of Big Data.

[57]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Neha Mehra,et al.  Survey on Multiclass Classification Methods , 2013 .

[59]  Nikolaos Doulamis,et al.  Deep Learning for Computer Vision: A Brief Review , 2018, Comput. Intell. Neurosci..

[60]  Kilian Q. Weinberger,et al.  Deep Networks with Stochastic Depth , 2016, ECCV.

[61]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[62]  Joachim Denzler,et al.  ImageNet pre-trained models with batch normalization , 2016, ArXiv.

[63]  Alexander J. Smola,et al.  Learning with kernels , 1998 .