DocScanner: Robust Document Image Rectification with Progressive Learning

Compared to flatbed scanners, portable smartphones are much more convenient for physical documents digitizing. However, such digitized documents are often distorted due to uncontrolled physical deformations, camera positions, and illumination variations. To this end, this work presents DocScanner, a new deep network architecture for document image rectification. Different from existing methods, DocScanner addresses this issue by introducing a progressive learning mechanism. Specifically, DocScanner maintains a single estimate of the rectified image, which is progressively corrected with a recurrent architecture. The iterative refinements make DocScanner converge to a robust and superior performance, and the lightweight recurrent architecture ensures the running efficiency. In addition, before the above rectification process, observing the corrupted rectified boundaries existing in prior works, DocScanner exploits a document localization module to explicitly segment the foreground document from the cluttered background environments. To further improve the rectification quality, based on the geometric priori between the distorted and the rectified images, a geometric regularization is introduced during training to further facilitate the performance. Extensive experiments are conducted on the Doc3D dataset and the DocUNet benchmark dataset, and the quantitative and qualitative evaluation results verify the effectiveness of DocScanner, which outperforms previous methods on OCR accuracy, image similarity, and our proposed distortion metric by a considerable margin. Furthermore, our DocScanner shows the highest efficiency in inference time and parameter count.

[1]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[2]  Hujun Bao,et al.  Deep Snake for Real-Time Instance Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Karteek Alahari,et al.  Learning Video Object Segmentation with Visual Memory , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[4]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[5]  Gady Agam,et al.  Document Image De-warping for Text/Graphics Recognition , 2002, SSPR/SPR.

[6]  Nam Ik Cho,et al.  State Estimation in a Document Image and Its Application in Text Block Identification and Text Line Extraction , 2010, ECCV.

[7]  Dimitris Samaras,et al.  DewarpNet: Single-Image Document Unwarping With Stacked 3D and 2D Regression Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[8]  Nam Ik Cho,et al.  Robust Document Image Dewarping Method Using Text-Lines and Line Segments , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[9]  Zhou Wang,et al.  Multiscale structural similarity for image quality assessment , 2003, The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003.

[10]  Phil D. Green,et al.  From WER and RIL to MER and WIL: improved evaluation measures for connected speech recognition , 2004, INTERSPEECH.

[11]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  W. B. Seales,et al.  Restoring 2D Content from Distorted Documents , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Katsushi Ikeuchi,et al.  Multiview Rectification of Folded Documents , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Michael S. Brown,et al.  Multi-View Document Rectification using Boundary , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Takashi Matsuyama,et al.  Shape from Shading with Interreflections Under a Proximal Light Source: Distortion-Free Copying of an Unfolded Book , 1997, International Journal of Computer Vision.

[17]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[18]  Dimitris Samaras,et al.  DocUNet: Document Image Unwarping via a Stacked U-Net , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19]  Antonio Torralba,et al.  SIFT Flow: Dense Correspondence across Scenes and Its Applications , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[21]  Ke Ma,et al.  Intrinsic Decomposition of Document Images In-the-Wild , 2020, BMVC.

[22]  Antonio J. Plaza,et al.  Image Segmentation Using Deep Learning: A Survey , 2021, IEEE transactions on pattern analysis and machine intelligence.

[23]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Yu Zhang,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 an Improved Physically-based Method for Geometric Restoration of Distorted Document Images , 2007 .

[25]  Pierre Gurdjos,et al.  Shape from shading for the digitization of curved documents , 2007, Machine Vision and Applications.

[26]  Gaofeng Meng,et al.  Geometric rectification of document images using adversarial gated unwarping network , 2020, Pattern Recognit..

[27]  Iasonas Kokkinos,et al.  Describing Textures in the Wild , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Chunjie Zhang,et al.  Progressively Complementary Network for Fisheye Image Rectification Using Appearance Flow , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  David S. Doermann,et al.  Geometric Rectification of Camera-Captured Document Images , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Michael S. Brown,et al.  A unified framework for document restoration using inpainting and shape-from-shading , 2009, Pattern Recognit..

[31]  Ian D. Reid,et al.  RefineNet: Multi-path Refinement Networks for High-Resolution Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Zichen Zhang,et al.  U2-Net: Going Deeper with Nested U-Structure for Salient Object Detection , 2020, Pattern Recognit..

[33]  Bruno Lévy,et al.  Least squares conformal maps for automatic texture atlas generation , 2002, ACM Trans. Graph..

[34]  Cheng-Lin Liu,et al.  Dewarping Document Image by Displacement Flow Estimation with Fully Convolutional Network , 2020, DAS.

[35]  Yuandong Tian,et al.  Rectification and 3D reconstruction of curved document images , 2011, CVPR 2011.

[36]  Nam Ik Cho,et al.  Document dewarping via text-line based optimization , 2015, Pattern Recognit..

[37]  Chao Gao,et al.  BASNet: Boundary-Aware Salient Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Jia Deng,et al.  RAFT: Recurrent All-Pairs Field Transforms for Optical Flow , 2020, ECCV.

[39]  Yuan He,et al.  A Book Dewarping System by Boundary-Based 3D Surface Reconstruction , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[40]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[41]  Pierre Baylou,et al.  Active contours network to straighten distorted text lines , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).

[42]  Gaofeng Meng,et al.  Metric Rectification of Curved Document Images , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Shie Mannor,et al.  A Tutorial on the Cross-Entropy Method , 2005, Ann. Oper. Res..

[44]  Wolfram Luther,et al.  Document Image De-warping Based on Detection of Distorted Text Lines , 2005, ICIAP.

[45]  Chew Lim Tan,et al.  Restoring Warped Document Images through 3D Shape Modeling , 2006, IEEE Trans. Pattern Anal. Mach. Intell..

[46]  Thomas Brox,et al.  DeepTAM: Deep Tracking and Mapping , 2018, ECCV.

[47]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[48]  Ali Zandifar Unwarping scanned image of Japanese/English documents , 2007, 14th International Conference on Image Analysis and Processing (ICIAP 2007).

[49]  Atsushi Yamashita,et al.  Shape reconstruction and image restoration for non-flat surfaces of documents with a stereo vision system , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[50]  Thomas Brox,et al.  FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Isaac Amidror,et al.  Scattered data interpolation methods for electronic imaging systems: a survey , 2002, J. Electronic Imaging.

[52]  Sagnik Das,et al.  The Common Fold: Utilizing the Four-Fold to Dewarp Printed Documents from a Single Image , 2017, DocEng.

[53]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[54]  W. Brent Seales,et al.  Document restoration using 3D shape: a general deskewing algorithm for arbitrarily warped documents , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[55]  Xiaochun Cao,et al.  Progressive Contour Regression for Arbitrary-Shape Scene Text Detection , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[57]  W. Brent Seales,et al.  Image restoration of arbitrarily warped documents , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[58]  George Papandreou,et al.  Rethinking Atrous Convolution for Semantic Image Segmentation , 2017, ArXiv.

[59]  R. Smith,et al.  An Overview of the Tesseract OCR Engine , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).

[60]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[61]  Nam Ik Cho,et al.  Composition of a Dewarped and Enhanced Document Image From Two View Images , 2009, IEEE Transactions on Image Processing.

[62]  Amir Markovitz,et al.  Can You Read Me Now? Content Aware Rectification using Angle Supervision , 2020, ECCV.

[63]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[64]  Michael S. Brown,et al.  Geometric and shading correction for images of printed materials using boundary , 2006, IEEE Transactions on Image Processing.

[65]  Ying Wu,et al.  Exploiting Vector Fields for Geometric Rectification of Distorted Document Images , 2018, ECCV.

[66]  Gaofeng Meng,et al.  Active Flattening of Curved Document Images via Two Structured Beams , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[67]  Olga Veksler,et al.  Fast approximate energy minimization via graph cuts , 2001, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[68]  Pedro V. Sander,et al.  Document rectification and illumination correction using a patch-based CNN , 2019, ACM Trans. Graph..