Deep Boosting: Joint feature selection and analysis dictionary learning in hierarchy

This work investigates how the traditional image classification pipelines can be extended into a deep architecture, inspired by recent successes of deep neural networks. We propose a deep boosting framework based on layer-by-layer joint feature boosting and dictionary learning. In each layer, we construct a dictionary of filters by combining the filters from the lower layer, and iteratively optimize the image representation with a joint discriminative-generative formulation, i.e. minimization of empirical classification error plus regularization of analysis image generation over training images. For optimization, we perform two iterating steps: i) to minimize the classification error, select the most discriminative features using the gentle adaboost algorithm; ii) according to the feature selection, update the filters to minimize the regularization on analysis image representation using the gradient descent method. Once the optimization is converged, we learn the higher layer representation in the same way. Our model delivers several distinct advantages. First, our layer-wise optimization provides the potential to build very deep architectures. Second, the generated image representation is compact and meaningful. In several visual recognition tasks, our framework outperforms existing state-of-the-art approaches.

[1]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[2]  LinZhe,et al.  Label Consistent K-SVD , 2013 .

[3]  Thomas S. Huang,et al.  Image Classification Using Super-Vector Coding of Local Image Descriptors , 2010, ECCV.

[4]  Liang Lin,et al.  Deep feature learning with relative distance comparison for person re-identification , 2015, Pattern Recognit..

[5]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[6]  Lei Zhang,et al.  Bit-Scalable Deep Hashing With Regularized Similarity Learning for Image Retrieval and Person Re-Identification , 2015, IEEE Transactions on Image Processing.

[7]  Song-Chun Zhu,et al.  Learning AND-OR Templates for Object Recognition and Detection , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[9]  Zhi-Hua Zhou,et al.  Facial Age Estimation by Learning from Label Distributions , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Antonio Torralba,et al.  Sharing features: efficient boosting procedures for multiclass object detection , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[12]  Ryan M. Rifkin,et al.  In Defense of One-Vs-All Classification , 2004, J. Mach. Learn. Res..

[13]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[14]  Larry S. Davis,et al.  Label Consistent K-SVD: Learning a Discriminative Dictionary for Recognition , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Song-Chun Zhu,et al.  Learning Hybrid Image Templates (HIT) by Information Projection , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Michael Elad,et al.  Analysis K-SVD: A Dictionary-Learning Algorithm for the Analysis Sparse Model , 2013, IEEE Transactions on Signal Processing.

[18]  Trevor Darrell,et al.  The pyramid match kernel: discriminative classification with sets of image features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[19]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[20]  Antonio Criminisi,et al.  Object categorization by learned universal visual dictionary , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[21]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[22]  Song-Chun Zhu,et al.  Learning Active Basis Model for Object Detection and Recognition , 2010, International Journal of Computer Vision.

[23]  Timothy F. Cootes,et al.  Toward Automatic Simulation of Aging Effects on Face Images , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  John F. Canny,et al.  A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  David Zhang,et al.  Sparse Representation Based Fisher Discrimination Dictionary Learning for Image Classification , 2014, International Journal of Computer Vision.

[26]  Honglak Lee,et al.  Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[27]  ZhouZhi-Hua,et al.  Automatic Age Estimation Based on Facial Aging Patterns , 2007 .

[28]  Guillermo Sapiro,et al.  Supervised Sparse Analysis and Synthesis Operators , 2013, NIPS.

[29]  Kang Ryoung Park,et al.  Human Age Estimation Based on Multi-level Local Binary Pattern and Regression Method , 2014 .

[30]  Andrew Y. Ng,et al.  Selecting Receptive Fields in Deep Networks , 2011, NIPS.

[31]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[32]  Jake Porway,et al.  A stochastic graph grammar for compositional object representation and recognition , 2009, Pattern Recognit..

[33]  TorralbaAntonio,et al.  Modeling the Shape of the Scene , 2001 .

[34]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[35]  Michael Elad,et al.  Analysis versus synthesis in signal priors , 2006, 2006 14th European Signal Processing Conference.

[36]  SiZhangzhang,et al.  Learning AND-OR Templates for Object Recognition and Detection , 2013 .

[37]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[38]  Andrew Y. Ng,et al.  The Importance of Encoding Versus Training with Sparse Coding and Vector Quantization , 2011, ICML.

[39]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Xiaogang Wang,et al.  A Deep Sum-Product Architecture for Robust Facial Attributes Analysis , 2013, 2013 IEEE International Conference on Computer Vision.

[41]  Guillermo Sapiro,et al.  Supervised Dictionary Learning , 2008, NIPS.

[42]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Jian-Huang Lai,et al.  Discriminatively Trained And-Or Graph Models for Object Shape Detection , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[45]  ZhouZhi-Hua,et al.  Facial Age Estimation by Learning from Label Distributions , 2013 .

[46]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[47]  Liang Lin,et al.  Learning Compositional Shape Models of Multiple Distance Metrics by Information Projection , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[48]  Zhi-Hua Zhou,et al.  Automatic Age Estimation Based on Facial Aging Patterns , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49]  Lei Zhang,et al.  Projective dictionary pair learning for pattern classification , 2014, NIPS.

[50]  Shih-Fu Chang,et al.  Image Retrieval: Current Techniques, Promising Directions, and Open Issues , 1999, J. Vis. Commun. Image Represent..