Preparation of an Unconstrained Vietnamese Online Handwriting Database and Recognition Experiments by Recurrent Neural Networks

This paper presents our attempts to collect and analyze unconstrained Vietnamese online handwriting text patterns by pen-based computers. Totally, our database contains over 120,000 strokes from more than 140,000 characters, which is one of the largest Vietnamese online handwriting pattern databases currently. For building and analyzing our database, we made a collection tool, a line segmentation tool, and a delayed stroke detection tool. Moreover, we investigated some statistical information from personal information of writers. In order to solve the unconstrained handwriting recognition problem, we conducted experiments using Bidirectional Long Short-Term Memory (BLSTM) networks. BLSTM network is architecture of Recurrent Neural Network (RNN) and applied recently for many related problems. The performance of BLSTM network on our database is nearly 80% of accuracy even though this database contains many delayed strokes. In near future, we are going to avail our database for research purposes, as it would be the fundamental for the handwriting recognition research.

[1]  Joan-Andreu Sánchez,et al.  Recognition of on-line handwritten mathematical expressions using 2D stochastic context-free grammars and hidden Markov models , 2014, Pattern Recognit. Lett..

[2]  Marcus Liwicki,et al.  IAM-OnDB - an on-line English sentence database acquired from handwritten text on a whiteboard , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[3]  Alexander H. Waibel,et al.  Online handwriting recognition: the NPen++ recognizer , 2001, International Journal on Document Analysis and Recognition.

[4]  Stefan Knerr,et al.  The IRESTE On/Off (IRONOFF) dual handwriting database , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[5]  De Cao Tran An efficient method for on-line Vietnamese handwritten character recognition , 2012, SoICT '12.

[6]  Masaki Nakagawa,et al.  'Online recognition of Chinese characters: the state-of-the-art , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Masaki Nakagawa,et al.  Collection of on-line handwritten Japanese character pattern databases and their analyses , 2004, Document Analysis and Recognition.

[8]  Raed Abu Zitar,et al.  Development of an efficient neural-based segmentation technique for Arabic handwriting recognition , 2010, Pattern Recognit..

[9]  Alex Graves,et al.  Supervised Sequence Labelling with Recurrent Neural Networks , 2012, Studies in Computational Intelligence.

[10]  Isabelle Guyon,et al.  UNIPEN project of on-line data exchange and recognizer benchmarks , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5).

[11]  Volker Märgner,et al.  On-line Arabic handwriting recognition competition , 2010, International Journal on Document Analysis and Recognition (IJDAR).

[12]  The Duy Bui,et al.  On the problem of classifying Vietnamese online handwritten characters , 2008, 2008 10th International Conference on Control, Automation, Robotics and Vision.

[13]  Trung-Kien Nguyen,et al.  Vietnamese Word Segmentation with CRFs and SVMs: An Investigation , 2006, PACLIC.

[14]  Sargur N. Srihari,et al.  On-Line and Off-Line Handwriting Recognition: A Comprehensive Survey , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Venu Govindaraju,et al.  IBM_UB_1: A Dual Mode Unconstrained English Handwriting Dataset , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[16]  Ngo Quoc Tao,et al.  An Efficient Model for Isolated Vietnamese Handwritten Recognition , 2008, 2008 International Conference on Intelligent Information Hiding and Multimedia Signal Processing.

[17]  Ujjwal Bhattacharya,et al.  On-line Handwriting Recognition of Indian Scripts - The First Benchmark , 2010, 2010 12th International Conference on Frontiers in Handwriting Recognition.

[18]  Marcus Liwicki,et al.  A novel approach to on-line handwriting recognition based on bidirectional long short-term memory networks , 2007 .

[19]  Karl Sims,et al.  Handwritten Character Classification Using Nearest Neighbor in Large Databases , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Fei Yin,et al.  CASIA Online and Offline Chinese Handwriting Databases , 2011, 2011 International Conference on Document Analysis and Recognition.