Word Recognition in Captured Images by CNN Trained with Synthetic Images

Problems like robotic navigation and automatic geocoding of businesses require artificial agents to perform rapid and accurate word recognition in natural images. We set out to develop a deep learning method to recognize words from different languages in captured images, with high accuracy and with small number of captured samples. Our experiments reveal three main findings. First, we feed images of words as inputs to the neural network directly, omitting segmentation and postprocessing step to avoid compound errors. We found this method to work well for our samples. Second, we are able to train machine learning models to recognize words using purely synthetic training samples by applying feature extractions to both training and testing datasets prior to passing them through deep networks. This achievement allows us to train neural network cheaply on synthetic data and transfer knowledge to recognize words in real data. Third, we set up experiments to compare model performances when using Canny edge detection and Chu’s 3D thinning algorithm as preprocessing methods. We found that Canny edge detection performs better in most cases.

[1]  Andrew Zisserman,et al.  Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition , 2014, ArXiv.

[2]  M. Akil,et al.  A comparison study between MLP and convolutional neural network models for character recognition , 2017, Commercial + Scientific Sensing and Imaging.

[3]  C. V. Jawahar,et al.  Unconstrained OCR for Urdu Using Deep CNN-RNN Hybrid Networks , 2017, 2017 4th IAPR Asian Conference on Pattern Recognition (ACPR).

[4]  Rangasami L. Kashyap,et al.  Building Skeleton Models via 3-D Medial Surface/Axis Thinning Algorithms , 1994, CVGIP Graph. Model. Image Process..

[5]  Xi Shen,et al.  A Method of Synthesizing Handwritten Chinese Images for Data Augmentation , 2016, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[6]  Frank D. Wood,et al.  Using synthetic data to train neural networks is model-based reasoning , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[7]  John F. Canny,et al.  A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Gaurav Y. Tawde,et al.  An Overview of Feature Extraction Techniques in OCR for Indian Scripts Focused on Offline Handwriting , 2013 .

[9]  Marta Mejail,et al.  Optical character recognition using transfer learning decision forests , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[10]  Jürgen Schmidhuber,et al.  Transfer learning for Latin and Chinese characters with Deep Neural Networks , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[11]  Horst Bunke,et al.  Generation of synthetic training data for an HMM-based handwriting recognition system , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[12]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[13]  Patrice Y. Simard,et al.  Best practices for convolutional neural networks applied to visual document analysis , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..