Reshaping inputs for convolutional neural network: Some common and uncommon methods

Abstract Convolutional Neural Network has become very common in the field of computer vision in recent years. But it comes with a severe restriction regarding the size of the input image. Most convolutional neural networks are designed in a way so that they can only accept images of a fixed size. This creates several challenges during data acquisition and model deployment. The common practice to overcome this limitation is to reshape the input images so that they can be fed into the networks. Many standard pre-trained networks and datasets come with a provision of working with square images. In this work we analyze 25 different reshaping methods across 6 datasets corresponding to different domains trained on three famous architectures namely Inception-V3, which is an extension of GoogLeNet, the Residual Networks (Resent-18) and the 121-Layer deep DenseNet. While some of the reshaping methods like “interpolation” and “cropping” have been commonly used with convolutional neural networks, some uncommon techniques like “containing”, “tiling” and “mirroring” have also been demonstrated. In total, 450 neural networks were trained from scratch to provide various analyses regarding the convergence of the validation loss and the accuracy obtained on the test data. Statistical measures have been provided to demonstrate the dependence between parameter choices and datasets. Several key observations were noted such as the benefits of using randomized processes, poor performance of the commonly used “cropping” techniques and so on. The paper intends to provide empirical evidence to guide the reader to choose a proper technique of reshaping inputs for their convolutional neural networks. The official code is available in https://github.com/DVLP-CMATERJU/Reshaping-Inputs-for-CNN.

[1]  Jürgen Schmidhuber,et al.  Multi-column deep neural network for traffic sign classification , 2012, Neural Networks.

[2]  Sylvain Lefebvre,et al.  Structure‐Preserving Reshape for Textured Architectural Scenes , 2009, Comput. Graph. Forum.

[3]  Ken Turkowski,et al.  Filters for common resampling tasks , 1990 .

[4]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[5]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[6]  Fan Zhang,et al.  Deep Convolutional Neural Networks for Hyperspectral Image Classification , 2015, J. Sensors.

[7]  Mahantapas Kundu,et al.  Handwritten isolated Bangla compound character recognition: A new benchmark using a novel deep learning approach , 2017, Pattern Recognit. Lett..

[8]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Sergio Guadarrama,et al.  Im2Calories: Towards an Automated Mobile Vision Food Diary , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[10]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  H. L. Shashidhara,et al.  Image Scaling Comparison Using Universal Image Quality Index , 2009 .

[12]  Subhadip Basu,et al.  A benchmark image database of isolated Bangla handwritten compound characters , 2014, International Journal on Document Analysis and Recognition (IJDAR).

[13]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[14]  Michael Egmont-Petersen,et al.  Image processing with neural networks - a review , 2002, Pattern Recognit..

[15]  Uwe Stilla,et al.  Deep Learning Earth Observation Classification Using ImageNet Pretrained Networks , 2016, IEEE Geoscience and Remote Sensing Letters.

[16]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Xuelong Li,et al.  Transfer learning for pedestrian detection , 2013, Neurocomputing.

[18]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Jefersson Alex dos Santos,et al.  Towards better exploiting convolutional neural networks for remote sensing scene classification , 2016, Pattern Recognit..

[20]  J. Sarvaiya,et al.  Image Registration by Template Matching Using Normalized Cross-Correlation , 2009, 2009 International Conference on Advances in Computing, Control, and Telecommunication Technologies.

[21]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[22]  Kristen Grauman,et al.  Reshaping Visual Datasets for Domain Adaptation , 2013, NIPS.

[23]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[24]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[25]  Peter Young,et al.  From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions , 2014, TACL.

[26]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[27]  Mahantapas Kundu,et al.  A multi-scale deep quad tree based feature extraction method for the recognition of isolated handwritten characters of popular indic scripts , 2017, Pattern Recognit..

[28]  J. A. Parker,et al.  Comparison of Interpolating Methods for Image Resampling , 1983, IEEE Transactions on Medical Imaging.

[29]  C. V. Jawahar,et al.  Indian Movie Face Database: A benchmark for face recognition under wide variations , 2013, 2013 Fourth National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics (NCVPRIPG).

[30]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[31]  Natasha Gelfand,et al.  A survey of image retargeting techniques , 2010, Optical Engineering + Applications.

[32]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[33]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.