Line Segmentation from Unconstrained Handwritten Text Images using Adaptive Approach

Line segmentation from handwritten text images is one of the challenging task due to diversity and unknown variations as undefined spaces, styles, orientations, stroke heights, overlapping, and alignments. Though abundant researches, there is a need of improvement to achieve robustness and higher segmentation rates. In the present work, an adaptive approach is used for the line segmentation from handwritten text images merging the alignment of connected component coordinates and text height. The mathematical justification is provided for measuring the text height respective to the image size. The novelty of the work lies in the text height calculation dynamically. The experiments are tested on the dataset provided by the Chinese company for the project. The proposed scheme is tested on two different type of datasets; document pages having base lines and plain pages. Dataset is highly complex and consists of abundant and uncommon variations in handwriting patterns. The performance of the proposed method is tested on our datasets as well as benchmark datasets, namely IAM and ICDAR09 to achieve 98.01% detection rate on average. The performance is examined on the above said datasets to observe 91.99% and 96% detection rates, respectively.

[1]  Friedhelm Schwenker,et al.  A novel segmentation technique for online handwritten Bangla words , 2020, Pattern Recognit. Lett..

[2]  Alireza Alaei,et al.  A new scheme for unconstrained handwritten text-line segmentation , 2011, Pattern Recognit..

[3]  Bok-Suk Shin,et al.  Accurate and Robust Line Segment Extraction Using Minimum Entropy With Hough Transform , 2015, IEEE Transactions on Image Processing.

[4]  Ihsin T. Phillips,et al.  Empirical Performance Evaluation of Graphics Recognition Systems , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Bok-Suk Shin,et al.  A statistical method for line segment detection , 2015, Comput. Vis. Image Underst..

[6]  Laurence Likforman-Sulem,et al.  A Hough based algorithm for extracting text lines in handwritten documents , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[7]  Venu Govindaraju,et al.  Line separation for complex document images using fuzzy runlength , 2004, First International Workshop on Document Image Analysis for Libraries, 2004. Proceedings..

[8]  Sébastien Eskenazi,et al.  A comprehensive survey of mostly textual document segmentation algorithms since 2008 , 2017, Pattern Recognit..

[9]  Yi Li,et al.  Detecting Text Lines in Handwritten Documents , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[10]  Rafael Grompone von Gioi,et al.  LSD: A Fast Line Segment Detector with a False Detection Control , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Sargur N. Srihari,et al.  A statistical approach to line segmentation in handwritten documents , 2007, Electronic Imaging.

[12]  Vassilis Katsouros,et al.  Handwritten document image segmentation into text lines and words , 2010, Pattern Recognit..

[13]  Alan Yuille,et al.  A Novel Linelet-Based Representation for Line Segment Detection , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Lambert Schomaker,et al.  A Comparison of Feature and Pixel-Based Methods for Recognizing Handwritten Bangla Digits , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[15]  Ioannis Pratikakis,et al.  Text line and word segmentation of handwritten documents , 2009, Pattern Recognit..

[16]  Fei Yin,et al.  Handwritten Chinese text line segmentation by clustering with distance metric learning , 2009, Pattern Recognit..

[17]  Basilios Gatos,et al.  Handwriting Segmentation Contest , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).

[18]  Palaiahnakote Shivakumara,et al.  Text segmentation in degraded historical document images , 2016 .