Classification of photographed document images based on deep-learning features

In this paper, we propose two new problems related to classification of photographed document images, and based on deep learning methods, present the baseline solutions for these two problems. The first problem is that, for some photographed document images, which book do they belong to? The second one is, for some photographed document images, what is the type of the book they belong to? To address these two problems, we apply “AexNet” to the collected document images. Using the pre-trained “AlexNet” on the ImageNet data set directly, we obtain 92.57% accuracy for the book-name classification and 93.33% accuracy for the book-type one. After fine-tuning on the training set of the photographed document images, the accuracy of the book-name classification increases to 95.54% and that of the booktype one to 95.42%. To our best knowledge, although there exist many image classification algorithm, no previous work has targeted to these two challenging problems. In addition, the experiments demonstrate that deep-learning features outperform features extracted with traditional image descriptors on these two problems.

[1]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[2]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[3]  Azriel Rosenfeld,et al.  Classification of document pages using structure-based features , 2001, International Journal on Document Analysis and Recognition.

[4]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[5]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[6]  Konstantinos G. Derpanis,et al.  Evaluation of deep convolutional nets for document image classification and retrieval , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[7]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[8]  Yillbyung Lee,et al.  Form classification using DP matching , 2000, SAC '00.

[9]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[10]  T. Munich,et al.  Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks , 2008, NIPS.

[11]  Matti Pietikäinen,et al.  Performance evaluation of texture measures with classification based on Kullback discrimination of distributions , 1994, Proceedings of 12th International Conference on Pattern Recognition.

[12]  Junyu Dong,et al.  Stretching deep architectures for text recognition , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[13]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[14]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[15]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[16]  Dorothea Blostein,et al.  A survey of document image classification: problem statement, classifier architecture and performance evaluation , 2007, International Journal of Document Analysis and Recognition (IJDAR).

[17]  Junyu Dong,et al.  Deep hashing learning networks , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).