Real-time image recognition using weighted spatial pyramid networks

AbstractThe latest-generation earth observation instruments on airborne and satellite platforms are currently producing an almost continuous high-dimensional data stream. This exponentially growing data poses a new challenge for real-time image processing and recognition. Making full and effective use of the spectral information and spatial structure information of high-resolution remote sensing image is the key to the processing and recognition of high-resolution remote sensing data. In this paper, the adaptive multipoint moment estimation (AMME) stochastic optimization algorithm is proposed for the first time by using the finite lower-order moments and adding the estimating points. This algorithm not only reduces the probability of local optimum in the learning process, but also improves the convergence rate of the convolutional neural network (Lee Cun et al. in Advances in neural information processing systems, 1990). Second, according to the remote sensing image with characteristics of complex background and small sensitive targets, and by automatic discovery, locating small targets, and giving high weights, we proposed a feature extraction method named weighted pooling to further improve the performance of real-time image recognition. We combine the AMME and weighted pooling with the spatial pyramid representation (Harada et al. in Comput Vis Pattern Recognit 1617–1624, 2011) algorithm to form a new, multiscale, and multilevel real-time image recognition model and name it weighted spatial pyramid networks (WspNet). At the end, we use the MNIST, ImageNet, and natural disasters under remote sensing data sets to test WspNet. Compared with other real-time image recognition models, WspNet achieve a new state of the art in terms of convergence rate and image feature extraction compared with conventional stochastic gradient descent method [like AdaGrad, AdaDelta and Adam (Zeiler in Comput Sci, 2012; Kingma and Ba in Comput Sci, 2014; Duchi et al. in J Mach Learn Res 12(7):2121–2159, 2011] and pooling method [like max-pooling, avg-pooling and stochastic-pooling (Zeiler and Fergus in stochastic-pooling for regularization of deep convolutional neural networks, 2013)].

[1]  LinLin Shen,et al.  Robust object representation by boosting-like deep learning architecture , 2016, Signal Process. Image Commun..

[2]  Chen Chen,et al.  Output Constraint Transfer for Kernelized Correlation Filter in Tracking , 2016, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[3]  Jean Ponce,et al.  A Theoretical Analysis of Feature Pooling in Visual Recognition , 2010, ICML.

[4]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Xiang Zhang,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[6]  Alfredo H.-S. Ang,et al.  Reliability of Structures and Structural Systems , 1968 .

[7]  Lei Wang,et al.  Boosting-Like Deep Convolutional Network for Pedestrian Detection , 2015, CCBR.

[8]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Mouhannad Ali,et al.  Real-time raindrop detection based on cellular neural networks for ADAS , 2016, Journal of Real-Time Image Processing.

[10]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[11]  Yasuo Kuniyoshi,et al.  Discriminative spatial pyramid , 2011, CVPR 2011.

[12]  John Salvatier,et al.  Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.

[13]  Roberto Cipolla,et al.  PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[14]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[15]  Geoffrey E. Hinton,et al.  Attend, Infer, Repeat: Fast Scene Understanding with Generative Models , 2016, NIPS.

[16]  Parul Parashar,et al.  Neural Networks in Machine Learning , 2014 .

[17]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[18]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[19]  Zhenyu Wang,et al.  Face recognition using adaptive local ternary patterns method , 2016, Neurocomputing.

[20]  Alessio Del Bue,et al.  Adaptive Local Movement Modelling for Object Tracking , 2015, 2015 IEEE Winter Conference on Applications of Computer Vision.

[21]  E. Rosenblueth Point estimates for probability moments. , 1975, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Rob Fergus,et al.  Stochastic Pooling for Regularization of Deep Convolutional Neural Networks , 2013, ICLR.

[23]  R. Vaillant,et al.  Original approach for the localisation of objects in images , 1994 .

[24]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[25]  R. Vaillant,et al.  An original approach for the localization of objects in images , 1993 .

[26]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[27]  Rongrong Ji,et al.  Bounding Multiple Gaussians Uncertainty with Application to Object Tracking , 2016, International Journal of Computer Vision.

[28]  Richard F. Lyon,et al.  Neural Networks for Machine Learning , 2017 .

[29]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[30]  Wuzhen Shi,et al.  Spatial and temporal pyramid-based real-time gesture recognition , 2017, Journal of Real-Time Image Processing.

[31]  Ning Qian,et al.  On the momentum term in gradient descent learning algorithms , 1999, Neural Networks.

[32]  Lawrence D. Jackel,et al.  Handwritten Digit Recognition with a Back-Propagation Network , 1989, NIPS.

[33]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.