Handwritten Optical Character Recognition system for Sindhi numerals

Sindhi language is script language like Arabic and Persian. It's origin is 2500 years old and spoken in various countries in Asia. In this paper, we propose an Optical Character Recognition (OCR) system which recognizes handwritten Sindhi numeral expressions (i.e. Sindhi handwritten numeral strings) without using common input devices such as keyboard and storage device memory. Our experiments focus on character recognition which later can be used for various applications such as tutoring, mathematical kids games, and automatic telephone number conversion from sign boards in India and Pakistan. In our research, we investigate the correlation between the numeral shapes and apply famous state-of-the art classifier based on correlation based template matching. We experimentally show that template matching gives poor performance as the shapes of numerals are highly correlated. There exists little volume of literature to address OCR on Sindhi language but unavailability of benchmark dataset makes it difficult for researchers around the world to re-implement the literature frameworks. We provide two sets of images which can be used for training and prediction.

[1]  Zeeshan Bhatti,et al.  Word Segmentation Model for Sindhi Text , 2014 .

[2]  Javed Ahmed Mahar,et al.  A MODEL FOR SINDHI TEXT SEGMENTATION INTO WORD TOKENS , 2012 .

[3]  Saeeda Naz,et al.  Arabic Script based Digit Recognition Systems , 2016 .

[4]  Dil Nawaz Hakro,et al.  INTERACTIVE THINNING FOR SEGMENTATION-BASED AND SEGMENTATION-FREE SINDHI OCR , 2015 .

[5]  Samee Ullah Khan,et al.  The optical character recognition of Urdu-like cursive scripts , 2014, Pattern Recognit..

[6]  Abdullah Zawawi Talib,et al.  ISSUES AND CHALLENGES IN SINDHI OCR , 2014 .

[7]  U. Ravi Babu,et al.  Handwritten Digit Recognition Using K-Nearest Neighbour Classifier , 2014, 2014 World Congress on Computing and Communication Technologies.

[8]  N. A. Shaikh,et al.  A Generalized Thinning Algorithm for Cursive and Non-Cursive Language Scripts , 2005, 2005 Pakistan Section Multitopic Conference.

[9]  Lifeng He,et al.  A Very Fast Algorithm for Simultaneously Performing Connected-Component Labeling and Euler Number Computing , 2015, IEEE Transactions on Image Processing.

[10]  Bin Yao,et al.  A graph-theory-based Euler number computing algorithm , 2015, 2015 IEEE International Conference on Information and Automation.

[11]  Zubair A. Shaikh,et al.  Character Segmentation of Sindhi, an Arabic Style Scripting Language, using Height Profile Vector , 2009 .

[12]  David G. Long,et al.  A median-filter-based ambiguity removal algorithm for NSCAT , 1991, IEEE Trans. Geosci. Remote. Sens..

[13]  Ghulam Ali,et al.  Segmentation of Arabic Text into Characters for Recognition , 2008, IMTIC.