An image database of handwritten Bangla words with automatic benchmarking facilities for character segmentation algorithms

Recognition of unconstrained handwritten word images is an interesting research problem which gets more challenging when lexicon-free words are considered. Prerequisite for developing a lexicon-free handwritten word recognition technique is the segmentation of a word image into its constituent character set. Therefore, a competent character segmentation technique is required to design a comprehensive word recognition module. However, the literature study reveals that there is no standard word image database with ground truth information. As a result, most character segmentation algorithms found in the literature rely on self-made databases with manual evaluation. To fill the research need, in the present scope of the work, a comprehensive database consisting of handwritten Bangla word images is prepared primarily for evaluating any character segmentation algorithms. Additionally, the present work also provides two types of ground truth images related to segmented character shapes of the word images. Besides, an evaluation tool is developed for assessing the performance of any character segmentation algorithm on the developed benchmark database. The benchmark result, as found here, is 0.9212 (F-score) which outperforms some state-of-the-art methods.

[1]  Jonathan J. Hull,et al.  A Database for Handwritten Text Recognition Research , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Lambert Schomaker,et al.  DeepOtsu: Document Enhancement and Binarization using Iterative Deep Learning , 2019, Pattern Recognit..

[3]  Subhadip Basu,et al.  Handwritten Devanagari Script Segmentation: A non-linear Fuzzy Approach , 2015, ArXiv.

[4]  Mahantapas Kundu,et al.  DevNet: An Efficient CNN Architecture for Handwritten Devanagari Character Recognition , 2020, Int. J. Pattern Recognit. Artif. Intell..

[5]  Venu Govindaraju,et al.  Probabilistic model for segmentation based word recognition with lexicon , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[6]  Subhadip Basu,et al.  A Script Independent Technique for Extraction of Characters from Handwritten Word Images , 2010, International Journal of Computer Applications.

[7]  Subhadip Basu,et al.  A hierarchical approach to recognition of handwritten Bangla characters , 2009, Pattern Recognit..

[8]  Subhadip Basu,et al.  Off-line Bangla handwritten word recognition: a holistic approach , 2019, Neural Computing and Applications.

[9]  M. S. Shirdhonkar,et al.  Preprocessing Framework for Document Image Analysis , 2019, International Journal of Advanced Networking and Applications.

[10]  Prasenjit Dey,et al.  HMM-based Indic handwritten word recognition using zone segmentation , 2016, Pattern Recognit..

[11]  Subhadip Basu,et al.  Word extraction from unconstrained handwritten Bangla document images using Spiral Run Length Smearing Algorithm , 2011, IICAI.

[12]  Subhadip Basu,et al.  CMATERdb1: a database of unconstrained handwritten Bangla and Bangla–English mixed script document image , 2011, International Journal on Document Analysis and Recognition (IJDAR).

[13]  Urszula Markowska-Kaczmar,et al.  Semi-Supervised Handwritten Word Segmentation Using Character Samples Similarity Maximization and Evolutionary Algorithm , 2007, 6th International Conference on Computer Information Systems and Industrial Management Applications (CISIM'07).

[14]  Subhadip Basu,et al.  An improved offline handwritten character segmentation algorithm for Bangla script , 2011, IICAI.

[15]  Ram Sarkar,et al.  Normalization of unconstrained handwritten words in terms of Slope and Slant Correction , 2019, Pattern Recognit. Lett..

[16]  Adel M. Alimi,et al.  Morphological Convolutional Neural Network Architecture for Digit Recognition , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[17]  Gurpreet Singh Lehal,et al.  An Iterative Algorithm for Segmentation of Isolated Handwritten Words in Gurmukhi Script , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[18]  Utpal Roy,et al.  A Novel Approach to Skew Detection and Character Segmentation for Handwritten Bangla Words , 2005, Digital Image Computing: Techniques and Applications (DICTA'05).

[19]  Samanway Sahoo,et al.  Handwritten Bangla word recognition using negative refraction based shape transformation , 2018, J. Intell. Fuzzy Syst..

[20]  Mita Nasipuri,et al.  A GA based hierarchical feature selection approach for handwritten word recognition , 2019, Neural Computing and Applications.

[21]  Ram Sarkar,et al.  Text-line extraction from handwritten document images using GAN , 2020, Expert Syst. Appl..

[22]  Xiaojun Chang,et al.  Adaptive Context-aware Reinforced Agent for Handwritten Text Recognition , 2018, BMVC.

[23]  Patrick J. Grother,et al.  The First Census Optical Character Recognition Systems Conference | NIST , 1992 .

[24]  Wei Li,et al.  Recognizing handwritten Chinese day and month words by combining a holistic method and a segmentation-based method , 2012, Neural Computing and Applications.

[25]  Subhadip Basu,et al.  Word Extraction and Character Segmentation from Text Lines of Unconstrained Handwritten Bangla Document Images , 2011, J. Intell. Syst..

[26]  Qingcai Chen,et al.  Unconstrained Offline Handwritten Word Recognition by Position Embedding Integrated ResNets Model , 2019, IEEE Signal Processing Letters.

[27]  Subhadip Basu,et al.  A Fuzzy Technique for Segmentation of Handwritten Bangla Word Images , 2007, 2007 International Conference on Computing: Theory and Applications (ICCTA'07).

[28]  Umapada Pal,et al.  Multi-Oriented and Multi-Sized Touching Character Segmentation Using Dynamic Programming , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[29]  Mita Nasipuri,et al.  A Holistic Approach for Handwritten Hindi Word Recognition , 2017, Int. J. Comput. Vis. Image Process..

[30]  Mita Nasipuri,et al.  Development of a page segmentation technique for Bangla documents printed in italic style , 2014, 2014 2nd International Conference on Business and Information Management (ICBIM).

[31]  Robert Sabourin,et al.  An HMM-Based Approach for Off-Line Unconstrained Handwritten Word Modeling and Recognition , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[32]  Alessandro Vinciarelli,et al.  A survey on off-line Cursive Word Recognition , 2002, Pattern Recognit..

[33]  Manoj Kumar Sharma,et al.  An efficient segmentation technique for Devanagari offline handwritten scripts using the Feedforward Neural Network , 2015, Neural Computing and Applications.

[34]  Abbas Nowzari-Dalini,et al.  Bio-inspired digit recognition using reward-modulated spike-timing-dependent plasticity in deep convolutional networks , 2019, Pattern Recognit..

[35]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[36]  M. Nasipuri,et al.  Text line extraction from handwritten document pages based on line contour estimation , 2012, 2012 Third International Conference on Computing, Communication and Networking Technologies (ICCCNT'12).

[37]  Malayappan Shridhar,et al.  A segmentation system for touching handwritten Japanese characters , 2002, Proceedings Eighth International Workshop on Frontiers in Handwriting Recognition.

[38]  Mita Nasipuri,et al.  A holistic word recognition technique for handwritten Bangla words , 2015, Int. J. Appl. Pattern Recognit..