Invariant representation for rectilinear rulings

Abstract. Ruling gap ratios are an affine-invariant characterization of parallel ruling configurations in scanned documents. This report quantifies the advantage of simultaneous extraction of horizontal and vertical rulings. It demonstrates that every ruling gap ratio can be derived from a minimal set of basis ratios. The effect on the basis ratios of noise on the radial coordinates of individual rulings is analyzed and the dependence of basis-ratio variability on random-phase sampling noise is determined as a function of the spatial sampling rate. The analysis provides insight into already-presented small-scale experimental results on form classification and guidance for future work that requires the extraction of parallel lines from scanned or photographed images.

[1]  D. Hinkley On the ratio of two correlated normal random variables , 1969 .

[2]  Thomas Bayer Understanding structured text documents by a model based document analysis system , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[3]  Osamu Hori,et al.  Robust table-form structure analysis based on box-driven reasoning , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[4]  V. Levenshtein Universal bounds for codes and designs, in Handbookof Coding Theory , 1998 .

[5]  Bruce P. Montgomery Returning Evidence to the Scene of the Crime: Why the Anfal Files Should be Repatriated to Iraqi Kurdistan , 2010 .

[6]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[7]  Andreas Dengel,et al.  ANASTASIL: A Hybrid Knowledge-Based System for Document Layout Analysis , 1989, IJCAI.

[8]  Bruce P. Montgomery The Iraqi Secret Police Files: A Documentary Record of the Anfal Genocide , 2001 .

[9]  Shane S. Sturrock,et al.  Time Warps, String Edits, and Macromolecules – The Theory and Practice of Sequence Comparison . David Sankoff and Joseph Kruskal. ISBN 1-57586-217-4. Price £13.95 (US$22·95). , 2000 .

[10]  Peter Bauer,et al.  Text, photo, and line extraction in scanned documents , 2012, J. Electronic Imaging.

[11]  Aurélie Lemaitre,et al.  Recognition of Tables and Forms , 2014, Handbook of Document Image Processing and Recognition.

[12]  Adnan Amin,et al.  Comparative study of skew detection algorithms , 1996, J. Electronic Imaging.

[13]  Gonzalo Navarro,et al.  A guided tour to approximate string matching , 2001, CSUR.

[14]  C. K. Chow,et al.  A Recognition Method Using Neighbor Dependence , 1962, IRE Trans. Electron. Comput..

[15]  Nam Ik Cho,et al.  Skew estimation of natural images based on a salient line detector , 2013, J. Electronic Imaging.

[16]  Horst Bunke,et al.  Edit distance-based kernel functions for structural pattern classification , 2006, Pattern Recognit..

[17]  Enrique Vidal,et al.  Computation of Normalized Edit Distance and Applications , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  Thomas Kieninger,et al.  Applying the T-Recs table recognition system to the business letter domain , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[19]  Elisa H. Barney Smith Characterization of image degradation caused by scanning , 1998, Pattern Recognit. Lett..

[20]  Wilhelm Burger,et al.  Digital Image Processing - An Algorithmic Introduction using Java , 2008, Texts in Computer Science.

[21]  Vishal Misra,et al.  Detection of Horizontal Lines in Noisy Run Length Encoded Images: The FAST Method , 1995, GREC.

[22]  George Nagy,et al.  On Parallel Lines in Noisy Forms , 2014, S+SSPR.

[23]  Daniel P. Lopresti,et al.  Repeated Sampling to Improve Classifier Accuracy , 1994, MVA.

[24]  David I. Havelock,et al.  The Topology of Locales and Its Effects on Position Uncertainty , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  Gernot A. Fink,et al.  Focusing computational visual attention in multi-modal human-robot interaction , 2010, ICMI-MLMI '10.

[26]  Kuo-Chin Fan,et al.  Form document identification using line structure based features , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[27]  Martin Raab,et al.  "Balls into Bins" - A Simple and Tight Analysis , 1998, RANDOM.

[28]  Andreas Dengel,et al.  Towards Understandable Explanations for Document Analysis Systems , 2012, 2012 10th IAPR International Workshop on Document Analysis Systems.

[29]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[30]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[31]  David I. Havelock,et al.  Geometric Precision in Noise-Free Digital Images , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[32]  Daniel P. Lopresti,et al.  Form similarity via Levenshtein distance between ortho-filtered logarithmic ruling-gap ratios , 2013, Electronic Imaging.

[33]  David Doermann,et al.  Handbook of Document Image Processing and Recognition , 2014, Springer London.

[34]  Bertrand Coüasnon,et al.  A real-world evaluation of a generic document recognition method applied to a military form of the 19th century , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[35]  Henry S. Baird,et al.  The State of the Art of Document Image Degradation Modelling , 2007 .

[36]  Daniel P. Lopresti,et al.  Spatial Sampling of Printed Patterns , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[37]  Karl Tombre Analysis of Engineering Drawings: State of the Art and Challenges , 1997, GREC.

[38]  Patrick A. V. Hall,et al.  Approximate String Matching , 1994, Encyclopedia of Algorithms.

[39]  Katsuhiko Itonori,et al.  Table structure recognition based on textblock arrangement and ruled line position , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[40]  Ashok Samal,et al.  A system for recognizing a large class of engineering drawings , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[41]  Jonathan J. Hull,et al.  Document Recognition IV , 1997 .