Speeding up Convolutional Neural Networks with Low Rank Expansions

The focus of this paper is speeding up the application of convolutional neural networks. While delivering impressive results across a range of computer vision and machine learning tasks, these networks are computationally demanding, limiting their deployability. Convolutional layers generally consume the bulk of the processing time, and so in this work we present two simple schemes for drastically speeding up these layers. This is achieved by exploiting cross-channel or filter redundancy to construct a low rank basis of filters that are rank-1 in the spatial domain. Our methods are architecture agnostic, and can be easily applied to existing CPU and GPU convolutional frameworks for tuneable speedup performance. We demonstrate this with a real world network designed for scene text character recognition [15], showing a possible 2.5× speedup with no loss in accuracy, and 4.5× speedup with less than 1% drop in accuracy, still achieving state-of-the-art on standard benchmarks.

[1]  S.M. Lucas,et al.  ICDAR 2005 text locating competition results , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[2]  Till Quack,et al.  Large scale mining and retrieval of visual data in a multimodal context , 2008 .

[3]  Honglak Lee,et al.  Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[4]  Manik Varma,et al.  Character Recognition in Natural Images , 2009, VISAPP.

[5]  Peter I. Corke,et al.  Using text-spotting to query the world , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[6]  Y-Lan Boureau,et al.  Learning Convolutional Feature Hierarchies for Visual Recognition , 2010, NIPS.

[7]  Jiri Matas,et al.  A Method for Text Localization and Recognition in Real-World Images , 2010, ACCV.

[8]  Kai Wang,et al.  End-to-end scene text recognition , 2011, 2011 International Conference on Computer Vision.

[9]  Koen E. A. van de Sande,et al.  Segmentation as selective search for object recognition , 2011, 2011 International Conference on Computer Vision.

[10]  Andreas Dengel,et al.  ICDAR 2011 Robust Reading Competition Challenge 2: Reading Text in Scene Images , 2011, 2011 International Conference on Document Analysis and Recognition.

[11]  Jiri Matas,et al.  Text Localization in Real-World Images Using Efficiently Pruned Exhaustive Search , 2011, 2011 International Conference on Document Analysis and Recognition.

[12]  Vincent Lepetit,et al.  Are sparse representations really relevant for image classification? , 2011, CVPR 2011.

[13]  Berin Martini,et al.  Large-Scale FPGA-based Convolutional Networks , 2011 .

[14]  Vincent Vanhoucke,et al.  Improving the speed of neural networks on CPUs , 2011 .

[15]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[16]  Tao Wang,et al.  End-to-end text recognition with convolutional neural networks , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[17]  Trevor Darrell,et al.  Sparselet Models for Efficient Multiclass Object Detection , 2012, ECCV.

[18]  Christophe Garcia,et al.  Simplifying ConvNets for Fast Learning , 2012, ICANN.

[19]  Yann LeCun,et al.  Scene parsing with Multiscale Feature Learning, Purity Trees, and Optimal Covers , 2012, ICML.

[20]  Harald Sack,et al.  A framework for improved video text detection and recognition , 2014, Multimedia Tools and Applications.

[21]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[22]  Jiřı́ Matas,et al.  Real-time scene text localization and recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Hartmut Neven,et al.  PhotoOCR: Reading Text in Uncontrolled Conditions , 2013, 2013 IEEE International Conference on Computer Vision.

[24]  Trevor Darrell,et al.  Discriminatively Activated Sparselets , 2013, ICML.

[25]  Vincent Lepetit,et al.  Learning Separable Filters , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Jiri Matas,et al.  Scene Text Localization and Recognition with Oriented Stroke Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[27]  Yoshua Bengio,et al.  Maxout Networks , 2013, ICML.

[28]  Misha Denil,et al.  Predicting Parameters in Deep Learning , 2014 .

[29]  Jon Almazán,et al.  ICDAR 2013 Robust Reading Competition , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[30]  Christian Szegedy,et al.  DeepPose: Human Pose Estimation via Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Andrew Zisserman,et al.  Deep Features for Text Spotting , 2014, ECCV.

[32]  R. Fergus,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[33]  Forrest N. Iandola,et al.  DenseNet: Implementing Efficient ConvNet Descriptor Pyramids , 2014, ArXiv.

[34]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[35]  Ming Yang,et al.  DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Yann LeCun,et al.  Fast Training of Convolutional Networks through FFTs , 2013, ICLR.

[37]  Joelle Pineau,et al.  End-to-End Text Recognition with Hybrid HMM Maxout Models , 2013, ICLR.

[38]  Yaroslav Bulatov,et al.  Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks , 2013, ICLR.

[39]  Ivan Laptev,et al.  Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Andrew Zisserman,et al.  Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition , 2014, ArXiv.

[41]  Joan Bruna,et al.  Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation , 2014, NIPS.