A database of unconstrained Vietnamese online handwriting and recognition experiments by recurrent neural networks

A Vietnamese Online Handwriting Database is made and analyzed.Vietnamese online handwritten text poses a challenge due to many delayed strokes.Long Short-Term Memory neural networks is effective to process delayed strokes. We present our efforts to create a database of unconstrained Vietnamese online handwritten text sampled from pen-based devices. The database stores handwritten text for paragraphs, lines, words, and characters, with the ground truth associated with every paragraph and line. We show a detailed statistical analysis of the handwritten text in this database and describe recognition experiments using several recent methods including the Bidirectional Long Short-Term Memory (BLSTM) network. Overall, our database contains over 480,000 strokes from more than 380,000 characters, which, at present, is the largest database of Vietnamese online handwritten text. Although Vietnamese script is based on a fixed set of alphabet letters, the recognition of Vietnamese online handwritten text poses a difficult challenge because of many diacritical marks, which usually result in delayed strokes during writing. We designed and implemented an online handwriting-collection tool to gather data, as well as a line-segmentation tool and a delayed-stroke-detection tool to analyze collected handwritten text. We also conducted a statistical analysis based on the writer profiles. We applied a number of the state-of-the-art recognition methods on unconstrained Vietnamese handwriting to evaluate their performance, including the BLSTM network, which is an efficient architecture derived from the Recurrent Neural Network (RNN) and is often applied to sequence labeling problems. The BLSTM network achieved 90% character recognition accuracy, despite many long sequences with several delayed strokes. Our database is allowed open access for research to stimulate the development of handwriting research technology.

[1]  Masaki Nakagawa,et al.  A robust method for coarse classifier construction from a large number of basic recognizers for on-line handwritten Chinese/Japanese character recognition , 2014, Pattern Recognit..

[2]  Masaki Nakagawa,et al.  Building a compact online MRF recognizer for large character set by structured dictionary representation and vector quantization technique , 2014, Pattern Recognit..

[3]  Isabelle Guyon,et al.  UNIPEN project of on-line data exchange and recognizer benchmarks , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5).

[4]  Volker Märgner,et al.  On-line Arabic handwriting recognition competition , 2011, 2011 International Conference on Document Analysis and Recognition.

[5]  Marcus Liwicki,et al.  IAM-OnDB - an on-line English sentence database acquired from handwritten text on a whiteboard , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[6]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[7]  Marcus Liwicki,et al.  A novel approach to on-line handwriting recognition based on bidirectional long short-term memory networks , 2007 .

[8]  Fei Yin,et al.  CASIA Online and Offline Chinese Handwriting Databases , 2011, 2011 International Conference on Document Analysis and Recognition.

[9]  Masaki Nakagawa,et al.  Online Handwritten Chinese/Japanese Character Recognition , 2012 .

[10]  Ujjwal Bhattacharya,et al.  On-line Handwriting Recognition of Indian Scripts - The First Benchmark , 2010, 2010 12th International Conference on Frontiers in Handwriting Recognition.

[11]  John R. Hershey,et al.  Uncertainty propagation through deep neural networks , 2015, INTERSPEECH.

[12]  Oendrila Samanta,et al.  Smoothing of HMM parameters for efficient recognition of online handwriting , 2014, Pattern Recognit..

[13]  Masaki Nakagawa,et al.  A robust model for on-line handwritten japanese text recognition , 2010, Electronic Imaging.

[14]  Lalit Gupta,et al.  Classification of temporal sequences via prediction using the simple recurrent neural network , 2000, Pattern Recognit..

[15]  Jonas Andersson,et al.  On-line Arabic handwriting recognition with templates , 2009, Pattern Recognit..

[16]  Raed Abu Zitar,et al.  Development of an efficient neural-based segmentation technique for Arabic handwriting recognition , 2010, Pattern Recognit..

[17]  Jin Hyung Kim,et al.  CROHME2011: Competition on Recognition of Online Handwritten Mathematical Expressions , 2011, 2011 International Conference on Document Analysis and Recognition.

[18]  Karl Sims,et al.  Handwritten Character Classification Using Nearest Neighbor in Large Databases , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Alexander H. Waibel,et al.  Online handwriting recognition: the NPen++ recognizer , 2001, International Journal on Document Analysis and Recognition.

[20]  Laurent Romary,et al.  Developping Tools and Building Linguistic Resources for Vietnamese Morpho-syntactic Processing , 2004, LREC.

[21]  Hung Tuan Nguyen,et al.  Preparation of an Unconstrained Vietnamese Online Handwriting Database and Recognition Experiments by Recurrent Neural Networks , 2016, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[22]  Hongan Wang,et al.  Minimum-risk training for semi-Markov conditional random fields with application to handwritten Chinese/Japanese text recognition , 2014, Pattern Recognit..

[23]  Alex Graves,et al.  Supervised Sequence Labelling with Recurrent Neural Networks , 2012, Studies in Computational Intelligence.

[24]  Gernot A. Fink,et al.  Markov models for offline handwriting recognition: a survey , 2009, International Journal on Document Analysis and Recognition (IJDAR).

[25]  Venu Govindaraju,et al.  IBM_UB_1: A Dual Mode Unconstrained English Handwriting Dataset , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[26]  Ngo Quoc Tao,et al.  An Efficient Model for Isolated Vietnamese Handwritten Recognition , 2008, 2008 International Conference on Intelligent Information Hiding and Multimedia Signal Processing.

[27]  Masaki Nakagawa,et al.  'Online recognition of Chinese characters: the state-of-the-art , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Gang Liu,et al.  SCUT-COUCH2009—a comprehensive online unconstrained Chinese handwriting database and benchmark evaluation , 2011, International Journal on Document Analysis and Recognition (IJDAR).

[29]  Jitendra Malik,et al.  Shape matching and object recognition using shape contexts , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[30]  Marcus Liwicki,et al.  Combining diverse on-line and off-line systems for handwritten text line recognition , 2009, Pattern Recognit..

[31]  Yoshua Bengio,et al.  Online and offline handwritten Chinese character recognition: A comprehensive study and new benchmark , 2016, Pattern Recognit..

[32]  Joan-Andreu Sánchez,et al.  An integrated grammar-based approach for mathematical expression recognition , 2016, Pattern Recognit..

[33]  Joan-Andreu Sánchez,et al.  Recognition of on-line handwritten mathematical expressions using 2D stochastic context-free grammars and hidden Markov models , 2014, Pattern Recognit. Lett..

[34]  Prasenjit Dey,et al.  HMM-based Indic handwritten word recognition using zone segmentation , 2016, Pattern Recognit..

[35]  The Duy Bui,et al.  On the problem of classifying Vietnamese online handwritten characters , 2008, 2008 10th International Conference on Control, Automation, Robotics and Vision.

[36]  Sargur N. Srihari,et al.  On-Line and Off-Line Handwriting Recognition: A Comprehensive Survey , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[37]  Stefan Knerr,et al.  The IRESTE On/Off (IRONOFF) dual handwriting database , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[38]  De Cao Tran An efficient method for on-line Vietnamese handwritten character recognition , 2012, SoICT '12.

[39]  Masaki Nakagawa,et al.  Collection of on-line handwritten Japanese character pattern databases and their analyses , 2004, Document Analysis and Recognition.