论文信息 - Convolutional Tables Ensemble: classification in microseconds

Convolutional Tables Ensemble: classification in microseconds

We study classifiers operating under severe classification time constraints, corresponding to 1-1000 CPU microseconds, using Convolutional Tables Ensemble (CTE), an inherently fast architecture for object category recognition. The architecture is based on convolutionally-applied sparse feature extraction, using trees or ferns, and a linear voting layer. Several structure and optimization variants are considered, including novel decision functions, tree learning algorithm, and distillation from CNN to CTE architecture. Accuracy improvements of 24-45% over related art of similar speed are demonstrated on standard object recognition benchmarks. Using Pareto speed-accuracy curves, we show that CTE can provide better accuracy than Convolutional Neural Networks (CNN) for a certain range of classification time constraints, or alternatively provide similar error rates with 5-200X speedup.

[1] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[2] Ivan V. Oseledets,et al. Speeding-up Convolutional Neural Networks Using Fine-tuned CP-Decomposition , 2014, ICLR.

[3] Rich Caruana,et al. Do Deep Nets Really Need to be Deep? , 2013, NIPS.

[4] Yoram Singer,et al. Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[5] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[6] Luc Van Gool,et al. Pedestrian detection at 100 frames per second , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[7] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[8] Hassan Foroosh,et al. Sparse Convolutional Neural Networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9] Vincent Lepetit,et al. A fast local descriptor for dense matching , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[10] Cordelia Schmid,et al. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[11] Dan Levi,et al. Part-Based Feature Synthesis for Human Detection , 2010, ECCV.

[12] Dan Levi,et al. Fast Multiple-Part Based Object Detection Using KD-Ferns , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[13] Jian Sun,et al. Face Alignment at 3000 FPS via Regressing Local Binary Features , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[14] Qiang Chen,et al. Network In Network , 2013, ICLR.

[15] S French,et al. Multicriteria Analysis , 1998, J. Oper. Res. Soc..

[16] Benjamin Klein,et al. Discriminative Ferns Ensemble for Hand Pose Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[17] Pietro Perona,et al. Integral Channel Features , 2009, BMVC.

[18] Benjamin Graham,et al. Spatially-sparse convolutional neural networks , 2014, ArXiv.

[19] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[20] Simon Haykin,et al. GradientBased Learning Applied to Document Recognition , 2001 .

[21] Prabhat,et al. Scalable Bayesian Optimization Using Deep Neural Networks , 2015, ICML.

[22] Vincent Vanhoucke,et al. Improving the speed of neural networks on CPUs , 2011 .

[23] Yoshua Bengio,et al. FitNets: Hints for Thin Deep Nets , 2014, ICLR.

[24] Chih-Jen Lin,et al. LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[25] Jian Sun,et al. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26] Pierre Geurts,et al. Extremely randomized trees , 2006, Machine Learning.

[27] Andrea Vedaldi,et al. MatConvNet: Convolutional Neural Networks for MATLAB , 2014, ACM Multimedia.

[28] Chih-Jen Lin,et al. A dual coordinate descent method for large-scale linear SVM , 2008, ICML '08.

[29] Andrew Y. Ng,et al. Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[30] Peter L. Bartlett,et al. Boosting Algorithms as Gradient Descent , 1999, NIPS.

[31] Vincent Lepetit,et al. Keypoint recognition using randomized trees , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32] Daphna Weinshall,et al. Object class recognition by boosting a part-based model , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[33] Paul A. Viola,et al. Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[34] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[35] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[36] Andrew W. Fitzgibbon,et al. Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[37] Jonathon Shlens,et al. Fast, Accurate Detection of 100,000 Object Classes on a Single Machine , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[38] J. Shotton,et al. Decision Forests for Classification, Regression, Density Estimation, Manifold Learning and Semi-Supervised Learning , 2011 .

[39] Andrew Zisserman,et al. Image Classification using Random Forests and Ferns , 2007, 2007 IEEE 11th International Conference on Computer Vision.