Convolutional Tables Ensemble: classification in microseconds

We study classifiers operating under severe classification time constraints, corresponding to 1-1000 CPU microseconds, using Convolutional Tables Ensemble (CTE), an inherently fast architecture for object category recognition. The architecture is based on convolutionally-applied sparse feature extraction, using trees or ferns, and a linear voting layer. Several structure and optimization variants are considered, including novel decision functions, tree learning algorithm, and distillation from CNN to CTE architecture. Accuracy improvements of 24-45% over related art of similar speed are demonstrated on standard object recognition benchmarks. Using Pareto speed-accuracy curves, we show that CTE can provide better accuracy than Convolutional Neural Networks (CNN) for a certain range of classification time constraints, or alternatively provide similar error rates with 5-200X speedup.

[1]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[2]  Ivan V. Oseledets,et al.  Speeding-up Convolutional Neural Networks Using Fine-tuned CP-Decomposition , 2014, ICLR.

[3]  Rich Caruana,et al.  Do Deep Nets Really Need to be Deep? , 2013, NIPS.

[4]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[5]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[6]  Luc Van Gool,et al.  Pedestrian detection at 100 frames per second , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[8]  Hassan Foroosh,et al.  Sparse Convolutional Neural Networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Vincent Lepetit,et al.  A fast local descriptor for dense matching , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[11]  Dan Levi,et al.  Part-Based Feature Synthesis for Human Detection , 2010, ECCV.

[12]  Dan Levi,et al.  Fast Multiple-Part Based Object Detection Using KD-Ferns , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Jian Sun,et al.  Face Alignment at 3000 FPS via Regressing Local Binary Features , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[15]  S French,et al.  Multicriteria Analysis , 1998, J. Oper. Res. Soc..

[16]  Benjamin Klein,et al.  Discriminative Ferns Ensemble for Hand Pose Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Pietro Perona,et al.  Integral Channel Features , 2009, BMVC.

[18]  Benjamin Graham,et al.  Spatially-sparse convolutional neural networks , 2014, ArXiv.

[19]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[20]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[21]  Prabhat,et al.  Scalable Bayesian Optimization Using Deep Neural Networks , 2015, ICML.

[22]  Vincent Vanhoucke,et al.  Improving the speed of neural networks on CPUs , 2011 .

[23]  Yoshua Bengio,et al.  FitNets: Hints for Thin Deep Nets , 2014, ICLR.

[24]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[25]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[27]  Andrea Vedaldi,et al.  MatConvNet: Convolutional Neural Networks for MATLAB , 2014, ACM Multimedia.

[28]  Chih-Jen Lin,et al.  A dual coordinate descent method for large-scale linear SVM , 2008, ICML '08.

[29]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[30]  Peter L. Bartlett,et al.  Boosting Algorithms as Gradient Descent , 1999, NIPS.

[31]  Vincent Lepetit,et al.  Keypoint recognition using randomized trees , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Daphna Weinshall,et al.  Object class recognition by boosting a part-based model , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[33]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[34]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[35]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[36]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[37]  Jonathon Shlens,et al.  Fast, Accurate Detection of 100,000 Object Classes on a Single Machine , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  J. Shotton,et al.  Decision Forests for Classification, Regression, Density Estimation, Manifold Learning and Semi-Supervised Learning , 2011 .

[39]  Andrew Zisserman,et al.  Image Classification using Random Forests and Ferns , 2007, 2007 IEEE 11th International Conference on Computer Vision.