Marginal Noise Reduction in Historical Handwritten Documents -- A Survey

This paper presents a survey on different approaches for removing the marginal noise from document images, and anlaysing the research challenges of those methods relating to handwritten historical datasets. In this survey, historical documents collected from Australian Archives and Libraries are introduced and the associated layout complexities of those document images are also described. Benchmarking other historical databases related to this work is also discussed. This survey discusses the difficulties and suitability of the state-of-the-art methods to remove marginal noise as well as preserving the text content from handwritten historical documents. This survey helps researchers to identify appropriate methods according to the associated marginal noise and also illustrates their drawbacks in order to make suggestions for developing approaches, which are more general and robust for any datasets.

[1]  Thomas M. Breuel,et al.  Document cleanup using page frame detection , 2008, International Journal of Document Analysis and Recognition (IJDAR).

[2]  Abdelkrim Meziane,et al.  An active contour based method for image binarization: Application to degraded historical document images , 2014, 2014 4th International Symposium ISKO-Maghreb: Concepts and Tools for knowledge Management (ISKO-Maghreb).

[3]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[4]  Thomas M. Breuel,et al.  Performance Evaluation and Benchmarking of Six-Page Segmentation Algorithms , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Andreas Keller,et al.  Lexicon-free handwritten word spotting using character HMMs , 2012, Pattern Recognit. Lett..

[6]  Rafael Dueire Lins,et al.  A new algorithm for removing noisy borders from monochromatic documents , 2004, SAC '04.

[7]  Kuo-Chin Fan,et al.  Marginal noise removal of document images , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[8]  B. Gatos,et al.  Automatic Borders Detection of Camera Document Images , 2007 .

[9]  Ioannis Pratikakis,et al.  Adaptive degraded document image binarization , 2006, Pattern Recognit..

[10]  Basilios Gatos,et al.  Page frame detection for double page document images , 2010, DAS '10.

[11]  Marcus Liwicki,et al.  Page Segmentation for Historical Handwritten Document Images Using Color and Texture Features , 2014, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[12]  Thomas M. Breuel,et al.  The Effect of Border Noise on the Performance of Projection-Based Page Segmentation Methods , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Lawrence O'Gorman,et al.  The Document Spectrum for Page Layout Analysis , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Thomas M. Breuel,et al.  A simple and effective approach for border noise removal from document images , 2009, 2009 IEEE 13th International Multitopic Conference.

[15]  Alicia Fornés,et al.  Transcription alignment of Latin manuscripts using hidden Markov models , 2011, HIP '11.

[16]  Yi Li,et al.  Script-Independent Text Line Segmentation in Freestyle Handwritten Documents , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Jamshid Shanbehzadeh,et al.  Document Image Noises and Removal Methods , 2013 .

[18]  David S. Doermann,et al.  Machine printed text and handwriting identification in noisy document images , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Rafael Dueire Lins,et al.  Efficient Removal of Noisy Borders from Monochromatic Documents , 2004, ICIAR.