POS Tagging and Structural Annotation of Handwritten Text Image Corpus of Devnagari Script

Natural Language Processing (NLP) germaneness required a large benchmark annotated dataset. Handwritten and impressed text corpus plays a momentous role in pattern recognition algorithm for benchmarking. Part-of-speech tagging is very recurrent and subjugated types of annotation. Because POS tagging is significant to many linguistic annotations like lemmatization, syntactic parsing, semantic annotation, etc. Part-of-Speech tagging together with the structural annotations of handwritten text image corpus of Devnagari script of 1300 handwritten form collected from different geographical location and demographics are narrating in this paper.

[1]  Timothy C. Bell,et al.  A corpus for the evaluation of lossless compression algorithms , 1997, Proceedings DCC '97. Data Compression Conference.

[2]  Namita Mittal,et al.  Discourse Based Sentiment Analysis for Hindi Reviews , 2013, PReMI.

[3]  Neeta Nain,et al.  A Four-Tier Annotated Urdu Handwritten Text Image Dataset for Multidisciplinary Research on Urdu Script , 2016, ACM Trans. Asian Low Resour. Lang. Inf. Process..

[4]  Horst Bunke,et al.  The IAM-database: an English sentence database for offline handwriting recognition , 2002, International Journal on Document Analysis and Recognition.

[5]  Neeta Nain,et al.  An annotated Urdu corpus of handwritten text image and benchmarking of corpus , 2014, 2014 37th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO).

[6]  Horst Bunke,et al.  A full English sentence database for off-line handwriting recognition , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[7]  Sutat Sae-Tang,et al.  Thai handwritten character corpus , 2004, IEEE International Symposium on Communications and Information Technology, 2004. ISCIT 2004..

[8]  Mamta Mittal,et al.  Handwritten Hindi character recognition: a review , 2018, IET Image Process..

[9]  Miguel A. Ferrer,et al.  Off-line Handwritten Signature GPDS-960 Corpus , 2007 .

[10]  Navneet Garg,et al.  Rule Based Hindi Part of Speech Tagger , 2012, COLING.

[11]  Pushpak Bhattacharyya,et al.  Morphological Richness Offsets Resource Demand - Experiences in Constructing a POS Tagger for Hindi , 2006, ACL.

[12]  Stefan Knerr,et al.  The IRESTE On/Off (IRONOFF) dual handwriting database , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[13]  Imran Siddiqi,et al.  An Unconstrained Benchmark Urdu Handwritten Sentence Database with Automatic Line Segmentation , 2012, 2012 International Conference on Frontiers in Handwriting Recognition.

[14]  Namita Mittal,et al.  Sentiment Analysis of Hindi Reviews based on Negation and Discourse Relation , 2013 .

[15]  Pushpak Bhattacharyya,et al.  Hindi POS Tagger Using Naive Stemming : Harnessing Morphological Information Without Extensive Linguistic Knowledge , 2008 .