Deep Neural Networks for Page Stream Segmentation and Classification

In this manuscript we propose a novel method for jointly page stream segmentation and multi-page document classification.The end goal is to classify a stream of pages as belonging to different classes of documents. We take advantage of the recent state-of-the-art results achieved using deep architectures in related fields such as document image classification, and we adopt similar models to obtain satisfying classification accuracies and a low computational complexity. Our contribution is twofold: first, the extraction of visual features from the processed documents is automatically performed by the chosen Convolutional Neural Network; second, the predictions of the same network are further refined using an additional deep model which processes them in a classic sliding-window manner to help finding and solving classification errors committed by the first network. The proposed pipeline has been evaluated on a publicly available dataset composed of more than half a million multi-page documents collected by an on-line loan comparison company, showing excellent results and high efficiency.

[1]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[2]  Prateek Sarkar Learning Image Anchor Templates for Document Classification and Data Extraction , 2010, 2010 20th International Conference on Pattern Recognition.

[3]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[4]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[5]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[6]  Peter Kulchyski and , 2015 .

[7]  Dorothea Blostein,et al.  A survey of document image classification: problem statement, classifier architecture and performance evaluation , 2007, International Journal of Document Analysis and Recognition (IJDAR).

[8]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[9]  Konstantinos G. Derpanis,et al.  Evaluation of deep convolutional nets for document image classification and retrieval , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[10]  John Scott Bridle,et al.  Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition , 1989, NATO Neurocomputing.

[11]  David S. Doermann,et al.  Unsupervised Classification of Structurally Similar Document Images , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[12]  Josep Lladós,et al.  Multipage document retrieval by textual and visual representations , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[13]  Yi Li,et al.  Convolutional Neural Networks for Document Image Classification , 2014, 2014 22nd International Conference on Pattern Recognition.

[14]  Volkmar Frinken,et al.  Multimodal page classification in administrative document image streams , 2014, International Journal on Document Analysis and Recognition (IJDAR).

[15]  Abdel Belaïd,et al.  Multipage Administrative Document Stream Segmentation , 2014, 2014 22nd International Conference on Pattern Recognition.

[16]  Albert Gordo,et al.  Document Classification and Page Stream Segmentation for Digital Mailroom Applications , 2013, 2013 12th International Conference on Document Analysis and Recognition.