Bangla text extraction by digital image processing

..........................................................................................................................................1 Chapter 1: Introduction ................................................................................................................2 1.1 Motivation ..............................................................................................................................3 1.2 Objectives ...............................................................................................................................3 1.3 Contribution Summary ...........................................................................................................4 1.4 Thesis Outline ........................................................................................................................4 Chapter 2: Background Analysis .................................................................................................5 2.1 Some properties Bangla character ..........................................................................................6 2.2 The Tesseract OCR ...............................................................................................................8 Chapter 3: Proposed Methodology...............................................................................................9 3.1 Workflow ...............................................................................................................................9 3.2 Proposed Diagram ...............................................................................................................10 Chapter 4: Implementation ........................................................................................................11 4.1 Required Tools and Programming Language .......................................................................11 4.2 Preparing Training Data .......................................................................................................11 4.3 Training Procedure for Bangla for Tesseract OCR Engine ..................................................14 4.3.1 Generate Training Images ............................................................................................14 4.3.2 Prepare box file..............................................................................................................15 4.3.3 Prepare Training file ......................................................................................................16

[1]  Jalal Mahmud,et al.  A complete OCR system for continuous Bengali characters , 2003, TENCON 2003. Conference on Convergent Technologies for Asia-Pacific Region.

[2]  Minhaz Fahim Zibran,et al.  Computer Representation of Bangla Characters and Sorting of Bangla Words , 2008 .

[3]  Chowdhury Mofizur Rahman,et al.  Optical Character Recognition of Bangla Characters using neural network: A better approach , 2005 .

[4]  S. M. Murtoza Habib,et al.  Segmentation free Bangla OCR using HMM: Training and recognition , 2007 .

[5]  Bidyut Baran Chaudhuri,et al.  A complete printed Bangla OCR system , 1998, Pattern Recognit..

[6]  Nasreen Akter,et al.  Development of a Recognizer for Bangla Text: Present Status and Future Challenges , 2010 .

[7]  Md Saiful Islam,et al.  Implementation of an Optical Character Reader (OCR) for Bengali language , 2015, 2015 International Conference on Data and Software Engineering (ICoDSE).

[8]  H. Sarwar,et al.  An Algorithm for Segmenting Modifiers from Bangla Text , 2008, 2008 11th International Conference on Computer and Information Technology.

[9]  Mumit Khan,et al.  Integrating Bangla script recognition support in tesseract OCR , 2009 .

[10]  S. M. Murtoza Habib,et al.  A High Performance Domain Specific Ocr For Bangla Script , 2008 .

[11]  Bidyut Baran Chaudhuri,et al.  OCR error detection and correction of an inflectional Indian language script , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[12]  Bidyut Baran Chaudhuri,et al.  OCR in Bangla: an Indo-Bangladeshi language , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5).

[13]  Fumitaka Kimura,et al.  Bangla Handwritten Character Recognition , 2005, IICAI.