Can You Read Me Now? Content Aware Rectification using Angle Supervision

The ubiquity of smartphone cameras has led to more and more documents being captured by cameras rather than scanned. Unlike flatbed scanners, photographed documents are often folded and crumpled, resulting in large local variance in text structure. The problem of document rectification is fundamental to the Optical Character Recognition (OCR) process on documents, and its ability to overcome geometric distortions significantly affects recognition accuracy. Despite the great progress in recent OCR systems, most still rely on a pre-process that ensures the text lines are straight and axis aligned. Recent works have tackled the problem of rectifying document images taken in-the-wild using various supervision signals and alignment means. However, they focused on global features that can be extracted from the document's boundaries, ignoring various signals that could be obtained from the document's content. We present CREASE: Content Aware Rectification using Angle Supervision, the first learned method for document rectification that relies on the document's content, the location of the words and specifically their orientation, as hints to assist in the rectification process. We utilize a novel pixel-wise angle regression approach and a curvature estimation side-task for optimizing our rectification model. Our method surpasses previous approaches in terms of OCR accuracy, geometric error and visual similarity.

[1]  Xiang Bai,et al.  Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Olga Sorkine-Hornung,et al.  Laplacian Mesh Processing , 2005, Eurographics.

[3]  Sagnik Das,et al.  The Common Fold: Utilizing the Four-Fold to Dewarp Printed Documents from a Single Image , 2017, DocEng.

[4]  Junjie Yan,et al.  FOTS: Fast Oriented Text Spotting with a Unified Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[5]  Harold W. Kuhn,et al.  The Hungarian method for the assignment problem , 1955, 50 Years of Integer Programming.

[6]  Alexandra Branzan Albu,et al.  Rectification of Camera-Captured Document Images with Mixed Contents and Varied Layouts , 2019, 2019 16th Conference on Computer and Robot Vision (CRV).

[7]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[9]  Tom E. Bishop,et al.  OrigamiNet: Weakly-Supervised, Segmentation-Free, One-Step, Full Page Text Recognition by learning to unfold , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Andreas Dengel,et al.  Document Image Dewarping using Deep Learning , 2019, ICPRAM.

[11]  Dimitris Samaras,et al.  DocUNet: Document Image Unwarping via a Stacked U-Net , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[12]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[13]  Dimitris Samaras,et al.  DewarpNet: Single-Image Document Unwarping With Stacked 3D and 2D Regression Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[14]  Yuan He,et al.  Real-Time Document Image Super-Resolution by Fast Matting , 2014, 2014 11th IAPR International Workshop on Document Analysis Systems.

[15]  Johannes Michael,et al.  A two-stage method for text line detection in historical documents , 2018, International Journal on Document Analysis and Recognition (IJDAR).

[16]  W. Brent Seales,et al.  Image restoration of arbitrarily warped documents , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  R. Smith,et al.  An Overview of the Tesseract OCR Engine , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).

[18]  Shuchang Zhou,et al.  EAST: An Efficient and Accurate Scene Text Detector , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Katsushi Ikeuchi,et al.  Multiview Rectification of Folded Documents , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Seong Joon Oh,et al.  What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[21]  Jie Gu,et al.  Text line extraction of curved document images using hybrid metric , 2015, 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR).

[22]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[23]  Xiangyang Xue,et al.  Arbitrary-Oriented Scene Text Detection via Rotation Proposals , 2017, IEEE Transactions on Multimedia.

[24]  Pedro V. Sander,et al.  Document rectification and illumination correction using a patch-based CNN , 2019, ACM Trans. Graph..

[25]  Olga Sorkine,et al.  Laplacian Mesh Processing , 2005 .

[26]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).