Writer-Aware CNN for Parsimonious HMM-Based Offline Handwritten Chinese Text Recognition

Recently, the hybrid convolutional neural network hidden Markov model (CNN-HMM) has been introduced for offline handwritten Chinese text recognition (HCTR) and has achieved state-of-the-art performance. However, modeling each of the large vocabulary of Chinese characters with a uniform and fixed number of hidden states requires high memory and computational costs and makes the tens of thousands of HMM state classes confusing. Another key issue of CNN-HMM for HCTR is the diversified writing style, which leads to model strain and a significant performance decline for specific writers. To address these issues, we propose a writer-aware CNN based on parsimonious HMM (WCNN-PHMM). First, PHMM is designed using a data-driven state-tying algorithm to greatly reduce the total number of HMM states, which not only yields a compact CNN by state sharing of the same or similar radicals among different Chinese characters but also improves the recognition accuracy due to the more accurate modeling of tied states and the lower confusion among them. Second, WCNN integrates each convolutional layer with one adaptive layer fed by a writer-dependent vector, namely, the writer code, to extract the irrelevant variability in writer information to improve recognition performance. The parameters of writer-adaptive layers are jointly optimized with other network parameters in the training stage, while a multiple-pass decoding strategy is adopted to learn the writer code and generate recognition results. Validated on the ICDAR 2013 competition of CASIA-HWDB database, the more compact WCNN-PHMM of a 7360-class vocabulary can achieve a relative character error rate (CER) reduction of 16.6% over the conventional CNN-HMM without considering language modeling. By adopting a powerful hybrid language model (N-gram language model and recurrent neural network language model), the CER of WCNN-PHMM is reduced to 3.17%.

[1]  Lianwen Jin,et al.  A Bayesian-based probabilistic model for unconstrained handwritten offline Chinese text line recognition , 2010, 2010 IEEE International Conference on Systems, Man and Cybernetics.

[2]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[3]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[4]  Jürgen Schmidhuber,et al.  Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.

[5]  D. Hubel,et al.  Receptive fields, binocular interaction and functional architecture in the cat's visual cortex , 1962, The Journal of physiology.

[6]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[7]  Christoph Adami Artificial intelligence: Robots with instincts , 2015, Nature.

[8]  Tong Liu,et al.  A Novel Segmentation and Recognition Algorithm for Chinese Handwritten Address Character Strings , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[9]  Jun Du,et al.  Parsimonious HMMs for Offline Handwritten Chinese Text Recognition , 2018, 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[10]  Fei Yin,et al.  Online and offline handwritten Chinese character recognition: Benchmarking on new databases , 2013, Pattern Recognit..

[11]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[12]  Jérôme Louradour,et al.  Segmentation-free handwritten Chinese text recognition with LSTM-RNN , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[13]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[14]  Lianwen Jin,et al.  Aggregation Cross-Entropy for Sequence Recognition , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[16]  Satoshi Naoi,et al.  Deep Knowledge Training and Heterogeneous CNN for Handwritten Chinese Text Recognition , 2016, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[17]  Yi-Chao Wu,et al.  Handwritten Chinese Text Recognition Using Separable Multi-Dimensional Recurrent Neural Network , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[18]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[19]  Fei Yin,et al.  Handwritten Chinese Text Recognition by Integrating Multiple Contexts , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Jun Du,et al.  A comprehensive study of hybrid neural network hidden Markov model for offline handwritten Chinese text recognition , 2018, International Journal on Document Analysis and Recognition (IJDAR).

[21]  Yue Lu,et al.  Advances in Chinese Document and Text Processing , 2017 .

[22]  Wenju Liu,et al.  Robust offline handwritten character recognition through exploring writer-independent features under the guidance of printed data , 2018, Pattern Recognit. Lett..

[23]  Cheng-Lin Liu,et al.  Writer Adaptation with Style Transfer Mapping , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Shenghuo Zhu,et al.  Extremely Low Bit Neural Network: Squeeze the Last Bit Out with ADMM , 2017, AAAI.

[25]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[26]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[27]  Rich Caruana,et al.  Model compression , 2006, KDD '06.

[28]  Jun Du,et al.  Writer Adaptation Using Bottleneck Features and Discriminative Linear Regression for Online Handwritten Chinese Character Recognition , 2014, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[29]  Lukás Burget,et al.  Extensions of recurrent neural network language model , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[30]  Li-Rong Dai,et al.  Fast Adaptation of Deep Neural Network Based on Discriminant Codes for Speech Recognition , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[31]  Jun Sun,et al.  Deep Transfer Mapping for Unsupervised Writer Adaptation , 2018, 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[32]  Xiangyu Zhang,et al.  Channel Pruning for Accelerating Very Deep Neural Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[33]  Xiangyu Zhang,et al.  ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34]  S. J. Young,et al.  Tree-based state tying for high accuracy acoustic modelling , 1994 .

[35]  Yi-Chao Wu,et al.  Improving handwritten Chinese text recognition using neural network language models and convolutional neural network shape models , 2017, Pattern Recognit..

[36]  Jian Sun,et al.  Accelerating Very Deep Convolutional Networks for Classification and Detection , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Jun Du,et al.  A discriminative linear regression approach to adaptation of multi-prototype based classifiers and its applications for Chinese OCR , 2013, Pattern Recognit..

[38]  Hermann Ney,et al.  On the Benefits of Convolutional Neural Network Combinations in Offline Handwriting Recognition , 2016, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[39]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition , 2012 .

[40]  Fei Yin,et al.  CASIA Online and Offline Chinese Handwriting Databases , 2011, 2011 International Conference on Document Analysis and Recognition.

[41]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[42]  Fei Yin,et al.  Unsupervised Adaptation of Neural Networks for Chinese Handwriting Recognition , 2016, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[43]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[44]  Xiaoqing Ding,et al.  Segmentation-Driven Offline Handwritten Chinese and Arabic Script Recognition , 2006, SACH.

[45]  Jun Du,et al.  Writer adaptive feature extraction based on convolutional neural networks for online handwritten Chinese character recognition , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[46]  Hiromichi Fujisawa,et al.  Forty years of research in character and document recognition - an industrial perspective , 2008, Pattern Recognit..

[47]  Yoshua Bengio,et al.  Online and offline handwritten Chinese character recognition: A comprehensive study and new benchmark , 2016, Pattern Recognit..

[48]  Jun Du,et al.  Writer Code Based Adaptation of Deep Neural Network for Offline Handwritten Chinese Text Recognition , 2016, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[49]  Lianwen Jin,et al.  Learning Spatial-Semantic Context with Fully Convolutional Recurrent Network for Online Handwritten Chinese Text Recognition , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[51]  Tianwen Zhang,et al.  Off-line recognition of realistic Chinese handwriting using segmentation-free strategy , 2009, Pattern Recognit..

[52]  Brent Bridgeman,et al.  Comparison of Human and Machine Scoring of Essays: Differences by Gender, Ethnicity, and Country , 2012 .

[53]  Hui Jiang,et al.  Fast speaker adaptation of hybrid NN/HMM model for speech recognition based on discriminative learning of speaker code , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[54]  Paul C. K. Kwok,et al.  Segmentation and recognition of Chinese bank check amounts , 2001, International Journal on Document Analysis and Recognition.

[55]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[56]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[57]  Feng Tian,et al.  Handwritten Chinese/Japanese Text Recognition Using Semi-Markov Conditional Random Fields , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.