Learning with Rethinking: Recurrently Improving Convolutional Neural Networks through Feedback

Recent years have witnessed the great success of convolutional neural network (CNN) based models in the field of computer vision. CNN is able to learn hierarchically abstracted features from images in an end-to-end training manner. However, most of the existing CNN models only learn features through a feedforward structure and no feedback information from top to bottom layers is exploited to enable the networks to refine themselves. In this paper, we propose a "Learning with Rethinking" algorithm. By adding a feedback layer and producing the emphasis vector, the model is able to recurrently boost the performance based on previous prediction. Particularly, it can be employed to boost any pre-trained models. This algorithm is tested on four object classification benchmark datasets: CIFAR-100, CIFAR-10, MNIST-background-image and ILSVRC-2012 dataset. These results have demonstrated the advantage of training CNN models with the proposed feedback mechanism.

[1]  Jiaxing Zhang,et al.  Attentional Neural Network: Feature Selection Using Cognitive Feedback , 2014, NIPS.

[2]  Trevor Darrell,et al.  Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Benjamin Graham,et al.  Spatially-sparse convolutional neural networks , 2014, ArXiv.

[4]  Xiaogang Wang,et al.  Deep Convolutional Network Cascade for Facial Point Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Yuxin Peng,et al.  The application of two-level attention models in deep convolutional neural network for fine-grained image classification , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Xiaolin Hu,et al.  Joint Training of Cascaded CNN for Face Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  J. M. Hupé,et al.  Cortical feedback improves discrimination between figure and background by V1, V2 and V3 neurons , 1998, Nature.

[8]  Trevor Darrell,et al.  Part-Based R-CNNs for Fine-Grained Category Detection , 2014, ECCV.

[9]  Jürgen Schmidhuber,et al.  A committee of neural networks for traffic sign classification , 2011, The 2011 International Joint Conference on Neural Networks.

[10]  Yann LeCun,et al.  Understanding Deep Architectures using a Recursive Convolutional Network , 2013, ICLR.

[11]  V. Lamme,et al.  The distinct modes of vision offered by feedforward and recurrent processing , 2000, Trends in Neurosciences.

[12]  Yoshua Bengio,et al.  Maxout Networks , 2013, ICML.

[13]  Lawrence D. Jackel,et al.  Handwritten Digit Recognition with a Back-Propagation Network , 1989, NIPS.

[14]  Xiaolin Hu,et al.  Recurrent convolutional neural network for object recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Xiang Bai,et al.  Text/non-text image classification in the wild with convolutional neural networks , 2017, Pattern Recognit..

[17]  Lianwen Jin,et al.  DropSample: A New Training Method to Enhance Deep Convolutional Neural Networks for Large-Scale Unconstrained Handwritten Chinese Character Recognition , 2015, Pattern Recognit..

[18]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[19]  Kyoung Mu Lee,et al.  Deeply-Recursive Convolutional Network for Image Super-Resolution , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Svetlana Lazebnik,et al.  Multi-scale Orderless Pooling of Deep Convolutional Activation Features , 2014, ECCV.

[21]  Seoung Wug Oh,et al.  Approaching the computational color constancy as a classification problem through deep learning , 2016, Pattern Recognit..

[22]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Dong-Chen He,et al.  Texture Unit, Texture Spectrum, And Texture Analysis , 1990 .

[24]  Gang Hua,et al.  A convolutional neural network cascade for face detection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  R. Desimone,et al.  A backward progression of attentional effects in the ventral stream , 2009, Proceedings of the National Academy of Sciences.

[27]  Yann LeCun,et al.  Traffic sign recognition with multi-scale Convolutional Networks , 2011, The 2011 International Joint Conference on Neural Networks.

[28]  Kaiqi Huang,et al.  Large scale crowd analysis based on convolutional neural network , 2015, Pattern Recognit..

[29]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Jürgen Schmidhuber,et al.  Deep Networks with Internal Selective Attention through Feedback Connections , 2014, NIPS.

[31]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[32]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Yoshua Bengio,et al.  An empirical evaluation of deep architectures on problems with many factors of variation , 2007, ICML '07.

[34]  Luc Van Gool,et al.  Seven Ways to Improve Example-Based Single Image Super Resolution , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Jefersson Alex dos Santos,et al.  Towards better exploiting convolutional neural networks for remote sensing scene classification , 2016, Pattern Recognit..

[36]  Cécile Barat,et al.  String representations and distances in deep Convolutional Neural Networks for image classification , 2016, Pattern Recognit..

[37]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[38]  Xiaogang Wang,et al.  Deep Learning Face Representation from Predicting 10,000 Classes , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  C. Gilbert,et al.  Brain States: Top-Down Influences in Sensory Processing , 2007, Neuron.

[40]  Jürgen Schmidhuber,et al.  Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[42]  Alex Graves,et al.  Supervised Sequence Labelling with Recurrent Neural Networks , 2012, Studies in Computational Intelligence.

[43]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[44]  Jonathan Tompson,et al.  Efficient object localization using Convolutional Networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Paolo Remagnino,et al.  How deep learning extracts and learns leaf features for plant classification , 2017, Pattern Recognit..

[46]  Steven Henikoff,et al.  SIFT: predicting amino acid changes that affect protein function , 2003, Nucleic Acids Res..

[47]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  S Marcelja,et al.  Mathematical description of the responses of simple cortical cells. , 1980, Journal of the Optical Society of America.

[49]  Ming Yang,et al.  DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[50]  Du-Sik Park,et al.  Rotating your face using multi-task deep neural network , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Shuicheng Yan,et al.  LG-CNN: From local parts to global discrimination for fine-grained recognition , 2017, Pattern Recognit..

[52]  Tim Curran,et al.  The Limits of Feedforward Vision: Recurrent Processing Promotes Robust Object Recognition when Objects Are Degraded , 2012, Journal of Cognitive Neuroscience.

[53]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[54]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[55]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[56]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[57]  Zhuowen Tu,et al.  Deeply-Supervised Nets , 2014, AISTATS.

[58]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[59]  Karl J. Friston,et al.  A DCM study of spectral asymmetries in feedforward and feedback connections between visual areas V1 and V4 in the monkey , 2015, NeuroImage.

[60]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[61]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[62]  Jean Ponce,et al.  A Theoretical Analysis of Feature Pooling in Visual Recognition , 2010, ICML.

[63]  Xiaoqing Ding,et al.  Discriminative Dimensionality Reduction for Multi-Dimensional Sequences , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[64]  SchmidhuberJürgen,et al.  A Novel Connectionist System for Unconstrained Handwriting Recognition , 2009 .

[65]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[66]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[67]  Yann LeCun,et al.  Regularization of Neural Networks using DropConnect , 2013, ICML.