Cascaded heterogeneous convolutional neural networks for handwritten digit recognition

This paper presents a handwritten digit recognition method based on cascaded heterogeneous convolutional neural networks (CNNs). The reliability and complementation of heterogeneous CNNs are investigated in our method. Each CNN recognizes a proportion of input samples with high-confidence, and feeds the rejected samples into the next CNN. The samples rejected by the last CNN are recognized by a voting committee of all CNNs. Experiments on MNIST dataset show that our method achieves an error rate 0.23% using only 5 C-NNs, on par with human vision system. Using heterogeneous networks can reduce the number of CNNs needed to reach certain performance compared with networks built from the same type. Further improvements include fine-tuning the rejection threshold of each CNN and adding CNNs of more types.

[1]  Yann LeCun,et al.  Efficient Pattern Recognition Using a New Transformation Distance , 1992, NIPS.

[2]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[3]  Patrice Y. Simard,et al.  Best practices for convolutional neural networks applied to visual document analysis , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[4]  Marc'Aurelio Ranzato,et al.  Efficient Learning of Sparse Representations with an Energy-Based Model , 2006, NIPS.

[5]  Hermann Ney,et al.  Deformation Models for Image Recognition , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Ching Y. Suen,et al.  A trainable feature extractor for handwritten digit recognition , 2007, Pattern Recognit..

[7]  Geoffrey E. Hinton,et al.  Learning a Nonlinear Embedding by Preserving Class Neighbourhood Structure , 2007, AISTATS.

[8]  Marc'Aurelio Ranzato,et al.  Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Balázs Kégl,et al.  Boosting products of base classifiers , 2009, ICML '09.

[10]  Ching Y. Suen,et al.  Isolated Handwritten Farsi Numerals Recognition Using Sparse and Over-Complete Representations , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[11]  Quoc V. Le,et al.  Tiled convolutional neural networks , 2010, NIPS.

[12]  Peng Li,et al.  CUDA Implementation of Deformable Pattern Recognition and its Application to MNIST Handwritten Digit Database , 2010, 2010 20th International Conference on Pattern Recognition.

[13]  Jean Ponce,et al.  A Theoretical Analysis of Feature Pooling in Visual Recognition , 2010, ICML.

[14]  Sven Behnke,et al.  Evaluation of Pooling Operations in Convolutional Architectures for Object Recognition , 2010, ICANN.

[15]  Luca Maria Gambardella,et al.  Convolutional Neural Network Committees for Handwritten Character Classification , 2011, 2011 International Conference on Document Analysis and Recognition.