Mining the displacement of max-pooling for text recognition

Abstract The max-pooling operation in convolutional neural networks (CNNs) downsamples the feature maps of convolutional layers. However, in doing so, it loses some spatial information. In this paper, we extract a novel feature from pooling layers, called displacement features, and combine them with the features resulting from max-pooling to capture the structural deformations for text recognition tasks. The displacement features record the location of the maximal value in a max-pooling operation. Furthermore, we analyze and mine the class-wise trends of the displacement features. The extensive experimental results and discussions demonstrate that the proposed displacement features can improve the performance of the CNN based architectures and tackle the issues with the structural deformations of max-pooling in the text recognition tasks.

[1]  Danna Zhou,et al.  d. , 1934, Microbial pathogenesis.

[2]  Shuchang Zhou,et al.  EAST: An Efficient and Accurate Scene Text Detector , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Xiaohui Xie,et al.  Handwritten Hangul recognition using deep convolutional neural networks , 2014, International Journal on Document Analysis and Recognition (IJDAR).

[4]  Laurent Wendling,et al.  Dtw-Radon-Based Shape Descriptor for Pattern Recognition , 2013, Int. J. Pattern Recognit. Artif. Intell..

[5]  Jun Sun,et al.  Building Fast and Compact Convolutional Neural Networks for Offline Handwritten Chinese Character Recognition , 2017, Pattern Recognit..

[6]  Jitendra Malik,et al.  Shape matching and object recognition using shape contexts , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[7]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[8]  Cordelia Schmid,et al.  EpicFlow: Edge-preserving interpolation of correspondences for optical flow , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Lior Wolf,et al.  CNN-N-Gram for HandwritingWord Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Jean Ponce,et al.  A Theoretical Analysis of Feature Pooling in Visual Recognition , 2010, ICML.

[11]  Luca Maria Gambardella,et al.  Convolutional Neural Network Committees for Handwritten Character Classification , 2011, 2011 International Conference on Document Analysis and Recognition.

[12]  Manik Varma,et al.  Character Recognition in Natural Images , 2009, VISAPP.

[13]  Xiangyang Xue,et al.  Arbitrary-Oriented Scene Text Detection via Rotation Proposals , 2017, IEEE Transactions on Multimedia.

[14]  M. Akil,et al.  A comparison study between MLP and convolutional neural network models for character recognition , 2017, Commercial + Scientific Sensing and Imaging.

[15]  Frans Coenen,et al.  Traffic sign recognition with convolutional neural network based on max pooling positions , 2016, 2016 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD).

[16]  Jiri Matas,et al.  Deep TextSpotter: An End-to-End Trainable Scene Text Localization and Recognition Framework , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[17]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Hao Yu,et al.  SqueezedText: A Real-Time Scene Text Recognition by Binary Convolutional Encoder-Decoder Network , 2018, AAAI.

[19]  Yann LeCun,et al.  Stacked What-Where Auto-encoders , 2015, ArXiv.

[20]  Seiichi Uchida,et al.  Discovering Class-Wise Trends of Max-Pooling in Subspace , 2018, 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[21]  Graham W. Taylor,et al.  Deconvolutional networks , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[22]  Yingli Tian,et al.  Unambiguous Text Localization and Retrieval for Cluttered Scenes , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Jason Yosinski,et al.  An Intriguing Failing of Convolutional Neural Networks and the CoordConv Solution , 2018, NeurIPS.

[24]  Wei Li,et al.  Diverse Region-Based CNN for Hyperspectral Image Classification , 2018, IEEE Transactions on Image Processing.

[25]  Partha Pratim Roy,et al.  Script Identification in Natural Scene Image and Video Frame using Attention based Convolutional-LSTM Network , 2018, Pattern Recognit..

[26]  Lianwen Jin,et al.  Design of a Very Compact CNN Classifier for Online Handwritten Chinese Character Recognition Using DropWeight and Global Pooling , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[27]  Hassan Foroosh,et al.  Character recognition in natural scene images using rank-1 tensor decomposition , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[28]  Geoffrey E. Hinton,et al.  Transforming Auto-Encoders , 2011, ICANN.

[29]  Jitendra Malik,et al.  Shape matching and object recognition using low distortion correspondences , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[30]  Yi Li,et al.  Deformable Convolutional Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[31]  Graham W. Taylor,et al.  Adaptive deconvolutional networks for mid and high level feature learning , 2011, 2011 International Conference on Computer Vision.

[32]  Zhuowen Tu,et al.  Generalizing Pooling Functions in CNNs: Mixed, Gated, and Tree , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[34]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[35]  Yong Zhang,et al.  Attention pooling-based convolutional neural network for sentence modelling , 2016, Inf. Sci..

[36]  C. SantoshK. Character Recognition based on DTW – Radon , 2011 .

[37]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Yi-Chao Wu,et al.  Improving handwritten Chinese text recognition using neural network language models and convolutional neural network shape models , 2017, Pattern Recognit..

[39]  Lianwen Jin,et al.  Learning Spatial-Semantic Context with Fully Convolutional Recurrent Network for Online Handwritten Chinese Text Recognition , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[41]  Junyu Dong,et al.  Stretching deep architectures for text recognition , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[42]  Dimosthenis Karatzas,et al.  Improving patch-based scene text script identification with ensembles of conjoined networks , 2016, Pattern Recognit..

[43]  Jiri Matas,et al.  A Method for Text Localization and Recognition in Real-World Images , 2010, ACCV.

[44]  Nibaran Das,et al.  Improved word-level handwritten Indic script identification by integrating small convolutional neural networks , 2019, Neural Computing and Applications.

[45]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[46]  Cordelia Schmid,et al.  DeepMatching: Hierarchical Deformable Dense Matching , 2015, International Journal of Computer Vision.

[47]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[48]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[49]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[50]  Geoffrey E. Hinton,et al.  Dynamic Routing Between Capsules , 2017, NIPS.

[51]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[52]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[53]  Deepak Kumar,et al.  Recognition of Kannada characters extracted from scene images , 2012, DAR '12.

[54]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[55]  Xiaogang Wang,et al.  DeepID-Net: Deformable deep convolutional neural networks for object detection , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Martin Thoma,et al.  The HASYv2 dataset , 2017, ArXiv.

[57]  Cordelia Schmid,et al.  DeepFlow: Large Displacement Optical Flow with Deep Matching , 2013, 2013 IEEE International Conference on Computer Vision.

[58]  Svetlana Lazebnik,et al.  Multi-scale Orderless Pooling of Deep Convolutional Activation Features , 2014, ECCV.

[59]  Wei Xu,et al.  CNN-RNN: A Unified Framework for Multi-label Image Classification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[60]  Xiang Bai,et al.  Script identification in the wild via discriminative convolutional neural network , 2016, Pattern Recognit..

[61]  H. Bourlard,et al.  Links Between Markov Models and Multilayer Perceptrons , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[62]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[63]  Kazuhiro Fukui,et al.  3D Object Recognition Based on Canonical Angles between Shape Subspaces , 2010, ACCV.

[64]  Heesung Kwon,et al.  Going Deeper With Contextual CNN for Hyperspectral Image Classification , 2016, IEEE Transactions on Image Processing.

[65]  Jun Sun,et al.  Handwritten Character Recognition by Alternately Trained Relaxation Convolutional Neural Network , 2014, 2014 14th International Conference on Frontiers in Handwriting Recognition.