Séparation imprimé-manuscrit par étude de la linéarité et de la régularité du texte

RÉSUMÉ. Le but de cet article est de proposer une méthode pour la séparation entre manuscrit et imprimé dans des documents. La méthode proposée repose sur des descripteurs originaux appartenant à deux catégories différentes, la linéarité et la régularité, invariants à la translation et à l’échelle. Plus précisément, nous dérivons une mesure de linéarité à partir de l’histogramme des longueurs des segments. Le cadre résultant est indépendant de la forme du document et du type de langage latin utilisé, et fournit une approche numériquement efficace. Ses performances, évaluées sur des documents réels, atteignent un taux de reconnaissance qui dépasse 90%.

[1]  David S. Doermann,et al.  Machine printed text and handwriting identification in noisy document images , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Xiaoming Huo,et al.  JBEAM: coding lines and curves via digital beamlets , 2004, Data Compression Conference, 2004. Proceedings. DCC 2004.

[3]  Nicole Vincent,et al.  Text independent writer recognition using redundant writing patterns with contour-based orientation and curvature features , 2010, Pattern Recognit..

[4]  Nicole Vincent,et al.  A Comparison of Line Detectors for Image Background Modelling , 2003, PRIS.

[5]  Nikos Papamarkos,et al.  Handwritten and Machine Printed Text Separation in Document Images Using the Bag of Visual Words Paradigm , 2012, 2012 International Conference on Frontiers in Handwriting Recognition.

[6]  K. R. Arvind,et al.  A Robust Two Level Classification Algorithm for Text Localization in Documents , 2007, ISVC.

[7]  T. Nakai,et al.  A Method of Annotation Extraction from Paper Documents Using Alignment Based on Local Arrangements of Feature Points , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).

[8]  Friedrich M. Wahl,et al.  Block segmentation and text extraction in mixed text/image documents , 1982, Comput. Graph. Image Process..

[9]  Friedrich M. Wahl,et al.  Block segmentation and text extraction in mixed text/image documents , 1982, Comput. Graph. Image Process..

[10]  Ashraf A. Kassim,et al.  A comparative study of efficient generalised Hough transform techniques , 1999, Image Vis. Comput..

[11]  Mohammad Rahmati,et al.  A New Method for Writer Identification of Handwritten Farsi Documents , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[12]  Manesh Kokare,et al.  Discrimination between Printed and Handwritten Text in Documents , 2010 .

[13]  F. Cloppet,et al.  Detection of linear structures in biological images , 2007, 2007 Conference Record of the Forty-First Asilomar Conference on Signals, Systems and Computers.

[14]  Stéphane Bres,et al.  Robust directional features for wordspotting in degraded Syriac manuscripts , 2008, 2008 International Workshop on Content-Based Multimedia Indexing.

[15]  Venu Govindaraju,et al.  Handwritten text separation from annotated machine printed documents using Markov Random Fields , 2011, International Journal on Document Analysis and Recognition (IJDAR).

[16]  Aura Conci,et al.  Automatic Discrimination between Printed and Handwritten Text in Documents , 2009, 2009 XXII Brazilian Symposium on Computer Graphics and Image Processing.

[17]  Jayant Kumar,et al.  Shape codebook based handwritten and machine printed text zone extraction , 2011, Electronic Imaging.

[18]  Abdel Belaïd,et al.  Séparation manuscrit et imprimé dans des documents administratifs complexes par utilisation de SVM et regroupement , 2012 .

[19]  Abdel Belaïd,et al.  Handwritten and Printed Text Separation in Real Document , 2013, MVA.

[20]  Denyse Baillargeon,et al.  Bibliographie , 1929 .

[21]  Ron Shpilman,et al.  Fast and robust techniques for detecting straight line segments using local models , 1999, Pattern Recognit. Lett..

[22]  Karin Wall,et al.  A fast sequential method for polygonal approximation of digitized curves , 1984, Comput. Vis. Graph. Image Process..