Hindi Text Document Classification System Using SVM and Fuzzy: A Survey

Inrecentyears,manyinformationretrieval,characterrecognition,andfeatureextractionmethodologies in Devanagari and especially in Hindi have been proposed for different domain areas. Due to enormous scanned data availability and to provide an advanced improvement of existing Hindi automatedsystemsbeyondopticalcharacterrecognition,anewideaofHindiprintedandhandwritten documentclassificationsystemusingsupportvectormachineandfuzzylogicisintroduced.This firstpre-processesandthenclassifiestextualimageddocumentsintopredefinedcategories.With thisconcept,thisarticledepictsafeasibilitystudyofsuchsystemswiththerelevanceofHindi,a surveyreportofstatisticalmeasurementsofHindikeywordsobtainedfromdifferentsources,andthe inherentchallengesfoundinprintedandhandwrittendocuments.Thetechnicalreviewsareprovided andgraphicallyrepresentedtocomparemanyparametersandestimatecontents,formsandclassifiers usedinvariousexistingtechniques. KeywoRDS Character Recognition, Fuzzy Logic, Handwritten Documents, Hindi Document Classification, Image Text Processing, Printed Documents, Statistical Analysis, SVM, Word Recognition

[1]  Mahesh M. Goyani,et al.  CHAIN CODE AND HOLISTIC FEATURES BASED OCR SYSTEM FOR PRINTED DEVANAGARI SCRIPT USING ANN AND SVM , 2012 .

[2]  Kunal Ravindra Shah,et al.  Devnagari handwritten character recognition (DHCR) for ancient documents: A review , 2013, 2013 IEEE CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGIES.

[3]  Pankaj Kumar,et al.  Problems of character segmentation in Handwritten Text Documents written in Devnagari Script , 2013 .

[4]  Latesh G. Malik A Graph Based Approach for Handwritten Devanagri Word Recogntion , 2012, 2012 Fifth International Conference on Emerging Trends in Engineering and Technology.

[5]  Vijay H. Mankar,et al.  Segmentation of Printed Devnagari Documents , 2011 .

[6]  Tetsushi Wakabayashi,et al.  Comparative Study of Devnagari Handwritten Character Recognition Using Different Feature and Classifiers , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[7]  Shalini Puri,et al.  A Fuzzy Similarity Based Concept Mining Model for Text Classification , 2012, ArXiv.

[8]  Faisal Rasheed Lone,et al.  Character segmentation for Nastaleeq URDU OCR: A review , 2016, 2016 International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT).

[9]  Mahantapas Kundu,et al.  Combining Multiple Feature Extraction Techniques for Handwritten Devnagari Character Recognition , 2008, 2008 IEEE Region 10 and the Third international Conference on Industrial and Information Systems.

[10]  Shalu Gupta,et al.  An efficient approach to detection of type of text and removal of Shirorekha in curved and straight text , 2009, MOCR '09.

[11]  P.S. Deshpande,et al.  Recognition of hand written devnagari characters with percentage component regular expression matching and classification tree , 2007, TENCON 2007 - 2007 IEEE Region 10 Conference.

[12]  Sandhya Arora,et al.  Characterizing Hand Written Devanagari Characters using Evolved Regular Expressions , 2006, TENCON 2006 - 2006 IEEE Region 10 Conference.

[13]  Fumitaka Kimura,et al.  Recognition of Off-Line Handwritten Devnagari Characters Using Quadratic Classifier , 2006, ICVGIP.

[14]  Santanu Chaudhury,et al.  Script based text identification: a multi-level architecture , 2011, MOCR_AND '11.

[15]  N. Sahu,et al.  An efficient handwritten Devnagari character recognition system using neural network , 2013, 2013 International Mutli-Conference on Automation, Computing, Communication, Control and Compressed Sensing (iMac4s).

[16]  Prachi Mukherji,et al.  Shape Feature and Fuzzy Logic Based Offline Devnagari Handwritten Optical Character Recognition , 2010 .

[17]  Avinash Pokhriyal,et al.  FUZZY RULE BASED CLASSIFICATION AND RECOGNITION OF HANDWRITTEN HINDI CURVE SCRIPT , 2013 .

[18]  Naresh Kumar Garg,et al.  Recognition of Offline Handwritten Hindi text using middle zone of the words , 2015, 2015 IEEE/ACIS 14th International Conference on Computer and Information Science (ICIS).

[19]  Shaila Apte,et al.  A fuzzy based classification scheme for unconstrained handwritten Devanagari character recognition , 2015, 2015 International Conference on Communication, Information & Computing Technology (ICCICT).

[20]  SOUMEN BAG,et al.  A survey on optical character recognition for Bangla and Devanagari scripts , 2013, Sadhana.

[21]  Madasu Hanmandlu,et al.  An approach to divide pre-detected Devanagari words from the scene images into characters , 2013, Signal Image Video Process..

[22]  Mudit Agrawal,et al.  Generalization of Hindi OCR Using Adaptive Segmentation and Font Files , 2009 .

[23]  Veena Bansal,et al.  A complete OCR for printed Hindi text in Devanagari script , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[24]  Naresh Kumar Garg,et al.  A New Method for Line Segmentation of Handwritten Hindi Text , 2010, 2010 Seventh International Conference on Information Technology: New Generations.

[25]  Bidyut Baran Chaudhuri,et al.  An OCR system to read two Indian language scripts: Bangla and Devnagari (Hindi) , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[26]  S. Kaushik,et al.  An enhanced fuzzy similarity based concept mining model for text classification using feature clustering , 2012, 2012 Students Conference on Engineering and Systems.

[27]  V. Hole,et al.  A survey of handwritten document pre-processing techniques and customizing for Indic script , 2011, ICWET.

[28]  P.S. Deshpande,et al.  Handwritten devnagari character recognition using connected segments and minimum edit distance , 2007, TENCON 2007 - 2007 IEEE Region 10 Conference.

[29]  Umapada Pal,et al.  Offline Recognition of Devanagari Script: A Survey , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[30]  Brijesh Verma,et al.  Handwritten Hindi character recognition using multilayer perceptron and radial basis function neural networks , 1995, Proceedings of ICNN'95 - International Conference on Neural Networks.

[31]  R. Mahesh K. Sinha,et al.  A Journey from Indian Scripts Processing to Indian Language Processing , 2009, IEEE Annals of the History of Computing.

[32]  Aejaz Farooq Ganai,et al.  Projection profile based ligature segmentation of Nastaleeq Urdu OCR , 2016, 2016 4th International Symposium on Computational and Business Intelligence (ISCBI).

[33]  Rajib Ghosh,et al.  Devanagari text extraction from natural scene images , 2014, 2014 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[34]  Mahantapas Kundu,et al.  Study of Different Features on Handwritten Devnagari Character , 2009, 2009 Second International Conference on Emerging Trends in Engineering & Technology.

[35]  Aarti Desai,et al.  A modified approach to thinning of Devanagri characters , 2011, 2011 3rd International Conference on Electronics Computer Technology.

[36]  Bidyut B. Chaudhuri,et al.  Segmentation of touching characters in printed Devnagari and Bangla scripts using fuzzy multifactorial analysis , 2002 .

[37]  Pooja Agrawal,et al.  Coarse Classification of Handwritten Hindi Characters , 2009 .

[38]  Tetsushi Wakabayashi,et al.  Off-Line Handwritten Character Recognition of Devnagari Script , 2007 .

[39]  Dayashankar Singh,et al.  Hindi character recognition using RBF neural network and directional group feature extraction technique , 2015, 2015 International Conference on Cognitive Computing and Information Processing(CCIP).

[40]  Yogesh H. Dandawate,et al.  Shirorekha extraction in Character Segmentation for printed devanagri text in Document Image Processing , 2014, 2014 Annual IEEE India Conference (INDICON).

[41]  Debashis Ghosh,et al.  A novel method for straightening curved text-lines in stylistic documents , 2014, EURASIP J. Image Video Process..

[42]  M. P. Nevetha,et al.  Applications of Text Detection and its Challenges: A Review , 2015, WCI '15.

[43]  H. B. Kekre,et al.  Devnagari Handwritten Character Recognition using LBG vector quantization with gradient masks , 2013, 2013 International Conference on Advances in Technology and Engineering (ICATE).

[44]  Dinesh V. Rojatkar,et al.  Design and analysis of LRTB feature based classifier applied to handwritten Devnagari characters: A neural network approach , 2013, 2013 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[45]  Shalini Puri,et al.  A technical study and analysis of text classification techniques in N - Lingual documents , 2016, 2016 International Conference on Computer Communication and Informatics (ICCCI).

[46]  Nitin Mishra,et al.  Robust Printed Devanagari Document Recognition using Hybrid Approach of Shirorekha Chopping, Fuzzy Directional Features and Support Vector Machine , 2012 .

[47]  Latesh Malik,et al.  Novel Approach to Segmentation of Handwritten Devnagari Word , 2010, 2010 3rd International Conference on Emerging Trends in Engineering and Technology.

[48]  Manoj Kumar,et al.  Devanagari handwritten text segmentation for overlapping and conjunct characters- A proficient technique , 2014, Proceedings of 3rd International Conference on Reliability, Infocom Technologies and Optimization.

[49]  Akanksha Gaur,et al.  Handwritten Hindi character recognition using k-means clustering and SVM , 2015, 2015 4th International Symposium on Emerging Trends and Technologies in Libraries and Information Services.

[50]  David S. Doermann,et al.  Adaptive Hindi OCR using generalized Hausdorff image comparison , 2003, TALIP.

[51]  Naresh Kumar Garg,et al.  THE HAZARDS IN SEGMENTATION OF HANDWRITTEN HINDI TEXT , 2011 .

[52]  C. V. Jawahar,et al.  BLSTM Neural Network Based Word Retrieval for Hindi Documents , 2011, 2011 International Conference on Document Analysis and Recognition.

[53]  Umapada Pal,et al.  Database Development and Recognition of Handwritten Devanagari Legal Amount Words , 2011, 2011 International Conference on Document Analysis and Recognition.

[54]  R. R. Karnik Identifying Devnagri characters , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).