..........................................................................................................................................1 Chapter 1: Introduction ................................................................................................................2 1.1 Motivation ..............................................................................................................................3 1.2 Objectives ...............................................................................................................................3 1.3 Contribution Summary ...........................................................................................................4 1.4 Thesis Outline ........................................................................................................................4 Chapter 2: Background Analysis .................................................................................................5 2.1 Some properties Bangla character ..........................................................................................6 2.2 The Tesseract OCR ...............................................................................................................8 Chapter 3: Proposed Methodology...............................................................................................9 3.1 Workflow ...............................................................................................................................9 3.2 Proposed Diagram ...............................................................................................................10 Chapter 4: Implementation ........................................................................................................11 4.1 Required Tools and Programming Language .......................................................................11 4.2 Preparing Training Data .......................................................................................................11 4.3 Training Procedure for Bangla for Tesseract OCR Engine ..................................................14 4.3.1 Generate Training Images ............................................................................................14 4.3.2 Prepare box file..............................................................................................................15 4.3.3 Prepare Training file ......................................................................................................16
[1]
Jalal Mahmud,et al.
A complete OCR system for continuous Bengali characters
,
2003,
TENCON 2003. Conference on Convergent Technologies for Asia-Pacific Region.
[2]
Minhaz Fahim Zibran,et al.
Computer Representation of Bangla Characters and Sorting of Bangla Words
,
2008
.
[3]
Chowdhury Mofizur Rahman,et al.
Optical Character Recognition of Bangla Characters using neural network: A better approach
,
2005
.
[4]
S. M. Murtoza Habib,et al.
Segmentation free Bangla OCR using HMM: Training and recognition
,
2007
.
[5]
Bidyut Baran Chaudhuri,et al.
A complete printed Bangla OCR system
,
1998,
Pattern Recognit..
[6]
Nasreen Akter,et al.
Development of a Recognizer for Bangla Text: Present Status and Future Challenges
,
2010
.
[7]
Md Saiful Islam,et al.
Implementation of an Optical Character Reader (OCR) for Bengali language
,
2015,
2015 International Conference on Data and Software Engineering (ICoDSE).
[8]
H. Sarwar,et al.
An Algorithm for Segmenting Modifiers from Bangla Text
,
2008,
2008 11th International Conference on Computer and Information Technology.
[9]
Mumit Khan,et al.
Integrating Bangla script recognition support in tesseract OCR
,
2009
.
[10]
S. M. Murtoza Habib,et al.
A High Performance Domain Specific Ocr For Bangla Script
,
2008
.
[11]
Bidyut Baran Chaudhuri,et al.
OCR error detection and correction of an inflectional Indian language script
,
1996,
Proceedings of 13th International Conference on Pattern Recognition.
[12]
Bidyut Baran Chaudhuri,et al.
OCR in Bangla: an Indo-Bangladeshi language
,
1994,
Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5).
[13]
Fumitaka Kimura,et al.
Bangla Handwritten Character Recognition
,
2005,
IICAI.