Southeast Asian palm leaf manuscript images: a review of handwritten text line segmentation methods and new challenges

Abstract. Due to their specific characteristics, palm leaf manuscripts provide new challenges for text line segmentation tasks in document analysis. We investigated the performance of six text line segmentation methods by conducting comparative experimental studies for the collection of palm leaf manuscript images. The image corpus used in this study comes from the sample images of palm leaf manuscripts of three different Southeast Asian scripts: Balinese script from Bali and Sundanese script from West Java, both from Indonesia, and Khmer script from Cambodia. For the experiments, four text line segmentation methods that work on binary images are tested: the adaptive partial projection line segmentation approach, the A* path planning approach, the shredding method, and our proposed energy function for shredding method. Two other methods that can be directly applied on grayscale images are also investigated: the adaptive local connectivity map method and the seam carving-based method. The evaluation criteria and tool provided by ICDAR2013 Handwriting Segmentation Contest were used in this experiment.

[1]  Made Windu Antara Kesiman,et al.  An initial study on the construction of ground truth binarized images of ancient palm leaf manuscripts , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[2]  R. J. Ramteke Invariant Moments Based Feature Extraction for Handwritten Devanagari Vowels Recognition , 2010 .

[3]  Basilios Gatos,et al.  Handwritten Text Line Segmentation by Shredding Text into its Lines , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[4]  Lance Chun Che Fung,et al.  Character segmentation from ancient palm leaf manuscripts in Thailand , 2011, HIP '11.

[5]  Venu Govindaraju,et al.  Text extraction from gray scale historical document images using adaptive local connectivity map , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[6]  Fatos T. Yarman-Vural,et al.  Optical Character Recognition for Cursive Handwriting , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Brijesh Verma,et al.  A novel feature extraction technique for the recognition of segmented handwritten characters , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[8]  Seong-Whan Lee,et al.  A New Methodology for Gray-Scale Character Segmentation and Recognition , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Lance Chun Che Fung,et al.  A Review of Evaluation of Optimal Binarization Technique for Character Segmentation in Historical Manuscripts , 2010, 2010 Third International Conference on Knowledge Discovery and Data Mining.

[10]  Jing Lin,et al.  PixLabeler: User Interface for Pixel-Level Labeling of Elements in Document Images , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[11]  Made Windu Antara Kesiman,et al.  An analysis of ground truth binarized image variability of palm leaf manuscripts , 2015, 2015 International Conference on Image Processing Theory, Tools and Applications (IPTA).

[12]  Laurence Likforman-Sulem,et al.  Text line segmentation of historical documents: a survey , 2007, International Journal of Document Analysis and Recognition (IJDAR).

[13]  Fatos T. Yarman-Vural,et al.  A new scheme for off-line handwritten connected digit recognition , 1998, 1998 Second International Conference. Knowledge-Based Intelligent Electronic Systems. Proceedings KES'98 (Cat. No.98EX111).

[14]  Ioannis Pratikakis,et al.  Performance Evaluation Methodology for Historical Document Image Binarization , 2013, IEEE Transactions on Image Processing.

[15]  Xi Zhang,et al.  Text Line Segmentation for Handwritten Documents Using Constrained Seam Carving , 2014, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[16]  Jihad El-Sana,et al.  Language-Independent Text Lines Extraction Using Seam Carving , 2011, 2011 International Conference on Document Analysis and Recognition.

[17]  George D. C. Cavalcanti,et al.  Text Line Segmentation Based on Morphology and Histogram Projection , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[18]  Angelika Garz,et al.  Binarization-Free Text Line Segmentation for Historical Documents Based on Interest Point Clustering , 2012, 2012 10th IAPR International Workshop on Document Analysis Systems.

[19]  Christopher A Stoll Text Line Extraction Using Seam Carving , 2015 .

[20]  Matti Pietikäinen,et al.  Adaptive document image binarization , 2000, Pattern Recognit..

[21]  Lance Chun Che Fung,et al.  Text Line Extraction Using Adaptive Partial Projection for Palm Leaf Manuscripts from Thailand , 2012, 2012 International Conference on Frontiers in Handwriting Recognition.

[22]  Ioannis Pratikakis,et al.  Text line and word segmentation of handwritten documents , 2009, Pattern Recognit..

[23]  Lambert Schomaker,et al.  A Path Planning for Line Segmentation of Handwritten Documents , 2014, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[24]  Its'hak Dinstein,et al.  2009 10th International Conference on Document Analysis and Recognition Line segmentation for degraded handwritten historical documents , 2022 .

[25]  Seong-Whan Lee,et al.  A new methodology for gray-scale character segmentation and recognition , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[26]  Jihad El-Sana,et al.  Text line segmentation for gray scale historical document images , 2011, HIP '11.

[27]  Fei Yin,et al.  Handwritten Chinese text line segmentation by clustering with distance metric learning , 2009, Pattern Recognit..

[28]  Sabine Süsstrunk,et al.  Seam Carving for Text Line Extraction on Color and Grayscale Historical Manuscripts , 2014, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[29]  Jayant Kumar,et al.  Handwritten Arabic text line segmentation using affinity propagation , 2010, DAS '10.