Investigating coupling preprocessing with shallow and deep convolutional neural networks in document image classification

Abstract. Convolutional neural networks (CNNs) are effective for image classification, and deeper CNNs are being used to improve classification performance. Indeed, as needs increase for searchability of vast printed document image collections, powerful CNNs have been used in place of conventional image processing. However, better performances of deep CNNs come at the expense of computational complexity. Are the additional training efforts required by deeper CNNs worth the improvement in performance? Or could a shallow CNN coupled with conventional image processing (e.g., binarization and consolidation) outperform deeper CNN-based solutions? We investigate performance gaps among shallow (LeNet-5, -7, and -9), deep (ResNet-18), and very deep (ResNet-152, MobileNetV2, and EfficientNet) CNNs for noisy printed document images, e.g., historical newspapers and document images in the RVL-CDIP repository. Our investigation considers two different classification tasks: (1) identifying poems in historical newspapers and (2) classifying 16 document types in document images. Empirical results show that a shallow CNN coupled with computationally inexpensive preprocessing can have a robust response with significantly reduced training samples; deep CNNs coupled with preprocessing can outperform very deep CNNs effectively and efficiently; and aggressive preprocessing is not helpful as it could remove potentially useful information in document images.

[1]  Yi Liu,et al.  Aida: Intelligent Image Analysis to Automatically Detect Poems in Digital Archives of Historic Newspapers , 2018, AAAI.

[2]  K. C. Santosh,et al.  g-DICE: graph mining-based document information content exploitation , 2015, International Journal on Document Analysis and Recognition (IJDAR).

[3]  Maroua Mehri,et al.  Text Line Segmentation in Historical Document Images Using an Adaptive U-Net Architecture , 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR).

[4]  Jia Deng,et al.  A large-scale hierarchical image database , 2009, CVPR 2009.

[5]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[6]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Jianying Hu,et al.  Document classification using layout analysis , 1999, Proceedings. Tenth International Workshop on Database and Expert Systems Applications. DEXA 99.

[8]  William A. Barrett,et al.  A recursive Otsu thresholding method for scanned document binarization , 2011, 2011 IEEE Workshop on Applications of Computer Vision (WACV).

[9]  Curtis Wigington,et al.  Multimodal Document Image Classification , 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR).

[10]  L. Deng,et al.  The MNIST Database of Handwritten Digit Images for Machine Learning Research [Best of the Web] , 2012, IEEE Signal Processing Magazine.

[11]  Heesung Kwon,et al.  Going Deeper With Contextual CNN for Hyperspectral Image Classification , 2016, IEEE Transactions on Image Processing.

[12]  Jenq-Neng Hwang,et al.  DesnowNet: Context-Aware Deep Network for Snow Removal , 2017, IEEE Transactions on Image Processing.

[13]  George Nagy,et al.  Twenty Years of Document Image Analysis in PAMI , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Azriel Rosenfeld,et al.  Classification of document pages using structure-based features , 2001, International Journal on Document Analysis and Recognition.

[15]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Melvin Wevers,et al.  Constructing a Recipe Web from Historical Newspapers , 2018, International Semantic Web Conference.

[17]  Konstantinos G. Derpanis,et al.  Evaluation of deep convolutional nets for document image classification and retrieval , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[18]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[19]  Nello Cristianini,et al.  Content analysis of 150 years of British periodicals , 2017, Proceedings of the National Academy of Sciences.

[20]  Lazaros T. Tsochatzidis,et al.  ICDAR 2019 Competition on Document Image Binarization (DIBCO 2019) , 2017, 2019 International Conference on Document Analysis and Recognition (ICDAR).

[21]  Konstantinos Zagoris,et al.  ICDAR2017 Competition on Document Image Binarization (DIBCO 2017) , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[22]  Showmik Bhowmik,et al.  U-Net versus Pix2Pix: a comparative study on degraded document image binarization , 2020, J. Electronic Imaging.

[23]  Muriel Visani,et al.  DocCreator: A New Software for Creating Synthetic Ground-Truthed Document Images , 2017, J. Imaging.

[24]  Thomas M. Breuel,et al.  Combined orientation and skew detection using geometric text-line modeling , 2009, International Journal on Document Analysis and Recognition (IJDAR).

[25]  Alicia Fornés,et al.  Deep learning for graphics recognition: document understanding and beyond , 2021, Int. J. Document Anal. Recognit..

[26]  Leen-Kiat Soh,et al.  Developing an Image-Based Classifier for Detecting Poetic Content in Historic Newspaper Collections , 2015, D Lib Mag..

[27]  Gang Wang,et al.  A Fast 2D Otsu Thresholding Algorithm Based on Improved Histogram , 2009, 2009 Chinese Conference on Pattern Recognition.

[28]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[29]  Khairuddin Omar,et al.  Degraded Historical Document Binarization: A Review on Issues, Challenges, Techniques, and Future Directions , 2019, J. Imaging.

[30]  Deyu Meng,et al.  Hyperspectral Image Classification With Markov Random Fields and a Convolutional Neural Network , 2017, IEEE Transactions on Image Processing.

[31]  Youcef Chibani,et al.  Nonsubsampled contourlet transform and k-means clustering for degraded document image binarization , 2019, J. Electronic Imaging.

[32]  Muhammad Imran Razzak,et al.  Deep optical character recognition: a case of Pashto language , 2020, J. Electronic Imaging.

[33]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[34]  Frédéric Jurie,et al.  New public dataset for spotting patterns in medieval document images , 2016, J. Electronic Imaging.

[35]  Shehzad Khalid,et al.  Recognition of printed Urdu ligatures using convolutional neural networks , 2019, J. Electronic Imaging.

[36]  Tao Chen,et al.  SS-HCNN: Semi-Supervised Hierarchical Convolutional Neural Network for Image Classification , 2019, IEEE Transactions on Image Processing.

[37]  Quoc V. Le,et al.  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[38]  Paolo Remagnino,et al.  Multi-Organ Plant Classification Based on Convolutional and Recurrent Neural Networks , 2018, IEEE Transactions on Image Processing.

[39]  Nicholas R. Howe,et al.  Document binarization with automatic parameter tuning , 2013, International Journal on Document Analysis and Recognition (IJDAR).

[40]  Raja Giryes,et al.  Class-Aware Fully Convolutional Gaussian and Poisson Denoising , 2018, IEEE Transactions on Image Processing.

[41]  Kai Chen,et al.  Convolutional Neural Networks for Page Segmentation of Historical Document Images , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[42]  Lei Zhang,et al.  Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising , 2016, IEEE Transactions on Image Processing.

[43]  Yang Li,et al.  Topic Network: Topic Model with Deep Learning for Image Classification , 2015, KSEM.

[44]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[46]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[48]  Basilios Gatos,et al.  cBAD: ICDAR2017 Competition on Baseline Detection , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[49]  Chris Tensmeyer,et al.  Analysis of Convolutional Neural Networks for Document Image Classification , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[50]  Arpita Dutta,et al.  Segmentation of text lines using multi-scale CNN from warped printed and handwritten document images , 2021, Int. J. Document Anal. Recognit..

[51]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[52]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[53]  Chris Tensmeyer,et al.  Document Image Binarization with Fully Convolutional Neural Networks , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[54]  Nam Ik Cho,et al.  Robust skew estimation using straight lines in document images , 2016, J. Electronic Imaging.

[55]  Jun Zhang,et al.  Crop leaf disease grade identification based on an improved convolutional neural network , 2020, J. Electronic Imaging.

[56]  Ioannis Pratikakis,et al.  ICDAR 2011 Document Image Binarization Contest (DIBCO 2011) , 2011, 2011 International Conference on Document Analysis and Recognition.

[57]  Vincenzo Loia,et al.  An alternative, layout‐driven approach to the clustering of documents , 2008, Int. J. Intell. Syst..

[58]  Jian Sun,et al.  Guided Image Filtering , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[59]  Wei Su,et al.  Multiscale dual-level network for hyperspectral image classification , 2020, J. Electronic Imaging.

[60]  Khurram Khurshid,et al.  Deep learning for automated forgery detection in hyperspectral document images , 2018, J. Electronic Imaging.

[61]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[62]  Vincent Poulain D'Andecy,et al.  Discourse Descriptor for Document Incremental Classification Comparison with Deep Learning , 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR).

[63]  Lei Zhang,et al.  FFDNet: Toward a Fast and Flexible Solution for CNN-Based Image Denoising , 2017, IEEE Transactions on Image Processing.

[64]  Liu Jianzhuang,et al.  Automatic thresholding of gray-level pictures using two-dimension Otsu method , 1991, China., 1991 International Conference on Circuits and Systems.

[65]  Yue Xu,et al.  Page Segmentation for Historical Handwritten Documents Using Fully Convolutional Networks , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[66]  Marcus Liwicki,et al.  Exploiting State-of-the-Art Deep Learning Methods for Document Image Analysis , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[67]  Aurélie Lemaitre,et al.  Combination of deep neural networks and logical rules for record segmentation in historical handwritten registers using few examples , 2021, International Journal on Document Analysis and Recognition (IJDAR).

[68]  Limin Wang,et al.  Knowledge Guided Disambiguation for Large-Scale Scene Classification With Multi-Resolution CNNs , 2016, IEEE Transactions on Image Processing.

[69]  Flavio Piccoli,et al.  Artistic photo filter removal using convolutional neural networks , 2018, J. Electronic Imaging.

[70]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[71]  Qionghai Dai,et al.  DECODE: Deep Confidence Network for Robust Image Classification , 2019, IEEE Transactions on Image Processing.

[72]  K. C. Santosh,et al.  Document Image Analysis: Current Trends and Challenges in Graphics Recognition , 2018 .

[73]  Xinghao Ding,et al.  Clearing the Skies: A Deep Network Architecture for Single-Image Rain Removal , 2016, IEEE Transactions on Image Processing.

[74]  Nibaran Das,et al.  Improved word-level handwritten Indic script identification by integrating small convolutional neural networks , 2019, Neural Computing and Applications.

[75]  V. S. Dhaka,et al.  Offline script recognition from handwritten and printed multilingual documents: a survey , 2021, International Journal on Document Analysis and Recognition (IJDAR).

[76]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[77]  Christopher Kermorvant,et al.  Handwritten Text Line Segmentation Using Fully Convolutional Network , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[78]  Fei Yang,et al.  Multi-channel and multi-scale mid-level image representation for scene classification , 2017, J. Electronic Imaging.

[79]  Johannes Michael,et al.  A two-stage method for text line detection in historical documents , 2018, International Journal on Document Analysis and Recognition (IJDAR).

[80]  Mingon Kang,et al.  DoT-Net: Document Layout Classification Using Texture-Based CNN , 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR).