Discriminative boosted forest with convolutional neural network-based patch descriptor for object detection

Abstract. Object detection with intraclass variations is challenging. The existing methods have not achieved the optimal combinations of classifiers and features, especially features learned by convolutional neural networks (CNNs). To solve this problem, we propose an object-detection method based on improved random forest and local image patches represented by CNN features. First, we compute CNN-based patch descriptors for each sample by modified CNNs. Then, the random forest is built whose split functions are defined by patch selector and linear projection learned by linear support vector machine. To improve the classification accuracy, the split functions in each depth of the forest make up a local classifier, and all local classifiers are assembled in a layer-wise manner by a boosting algorithm. The main contributions of our approach are summarized as follows: (1) We propose a new local patch descriptor based on CNN features. (2) We define a patch-based split function which is optimized with maximum class-label purity and minimum classification error over the samples of the node. (3) Each local classifier is assembled by minimizing the global classification error. We evaluate the method on three well-known challenging datasets: TUD pedestrians, INRIA pedestrians, and UIUC cars. The experiments demonstrate that our method achieves state-of-the-art or competitive performance.

[1]  Bernt Schiele,et al.  Robust Object Detection with Interleaved Categorization and Segmentation , 2008, International Journal of Computer Vision.

[2]  Zhuowen Tu,et al.  Probabilistic boosting-tree: learning discriminative models for classification, recognition, and clustering , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[3]  Jia Deng,et al.  Large scale visual recognition , 2012 .

[4]  Dumitru Erhan,et al.  Deep Neural Networks for Object Detection , 2013, NIPS.

[5]  Jia Deng,et al.  A large-scale hierarchical image database , 2009, CVPR 2009.

[6]  Deva Ramanan,et al.  Steerable part models , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Mandy Eberhart,et al.  Decision Forests For Computer Vision And Medical Image Analysis , 2016 .

[8]  Svetlana Lazebnik,et al.  Scene recognition and weakly supervised object localization with deformable part-based models , 2011, 2011 International Conference on Computer Vision.

[9]  Jonathan Brandt,et al.  Robust object detection via soft cascade , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[10]  David Vázquez,et al.  Random Forests of Local Experts for Pedestrian Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[11]  Paul A. Viola,et al.  Robust Real-time Object Detection , 2001 .

[12]  Pietro Perona,et al.  Integral Channel Features , 2009, BMVC.

[13]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[14]  Luc Van Gool,et al.  Pedestrian detection at 100 frames per second , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Tae-Kyun Kim,et al.  Fast Pedestrian Detection by Cascaded Random Forest with Dominant Orientation Templates , 2012, BMVC.

[16]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[17]  Tao Li,et al.  Discriminative Hough context model for object detection , 2013, The Visual Computer.

[18]  Juergen Gall,et al.  Class-specific Hough forests for object detection , 2009, CVPR.

[19]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[20]  Horst Bischof,et al.  Accurate Object Detection with Joint Classification-Regression Random Forests , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Horst Bischof,et al.  Alternating Decision Forests , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Dana H. Ballard,et al.  Generalizing the Hough transform to detect arbitrary shapes , 1981, Pattern Recognit..

[23]  Luc Van Gool,et al.  Real-time facial feature detection using conditional regression forests , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[25]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Luc Van Gool,et al.  Hough Forests for Object Detection, Tracking, and Action Recognition , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Xiang Zhang,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[28]  Mao Ye,et al.  Object detection using voting spaces trained by few samples , 2013 .

[29]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Stefan Roth,et al.  People-tracking-by-detection and people-detection-by-tracking , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Wenyu Liu,et al.  Human Detection Using Learned Part Alphabet and Pose Dictionary , 2014, ECCV.

[32]  Nuno Vasconcelos,et al.  On the design of robust classifiers for computer vision , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[33]  Gang Hua,et al.  Efficient Boosted Exemplar-Based Face Detection , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Min Sun,et al.  Conditional regression forests for human pose estimation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Jiri Matas,et al.  WaldBoost - learning for time constrained sequential detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[36]  Shuicheng Yan,et al.  An HOG-LBP human detector with partial occlusion handling , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[37]  Ramakant Nevatia,et al.  Cluster Boosted Tree Classifier for Multi-View, Multi-Pose Object Detection , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[38]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[39]  Christoph H. Lampert,et al.  Efficient Subwindow Search: A Branch and Bound Framework for Object Localization , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.