Decomposition-Based Domain Adaptation for Real-World Font Recognition

We present a domain adaption framework to address a domain mismatch between synthetic training and real-world testing data. We demonstrate our method on a challenging fine-grain classification problem: recognizing a font style from an image of text. In this task, it is very easy to generate lots of rendered font examples but very hard to obtain real-world labeled images. This real-to-synthetic domain gap caused poor generalization to new real data in previous font recognition methods (Chen et al. (2014)). In this paper, we introduce a Convolutional Neural Network decomposition approach, leveraging a large training corpus of synthetic data to obtain effective features for classification. This is done using an adaptation technique based on a Stacked Convolutional Auto-Encoder that exploits a large collection of unlabeled real-world text images combined with synthetic data preprocessed in a specific way. The proposed DeepFont method achieves an accuracy of higher than 80% (top-5) on a new large labeled real-world dataset we collected.

[1]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[2]  Jürgen Schmidhuber,et al.  Stacked Convolutional Auto-Encoders for Hierarchical Feature Extraction , 2011, ICANN.

[3]  Razvan Pascanu,et al.  Deep Learners Benefit More from Out-of-Distribution Examples , 2011, AISTATS.

[4]  Qiang Yang,et al.  Transferring Naive Bayes Classifiers for Text Classification , 2007, AAAI.

[5]  Yoshua Bengio,et al.  Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[6]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[7]  Andrew Zisserman,et al.  Deep Features for Text Spotting , 2014, ECCV.

[8]  Jason Weston,et al.  Deep learning via semi-supervised embedding , 2008, ICML '08.

[9]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[10]  Tony X. Han,et al.  Large-Scale Visual Font Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[12]  Yoshua Bengio,et al.  Domain Adaptation for Large-Scale Sentiment Classification: A Deep Learning Approach , 2011, ICML.

[13]  Kai Wang,et al.  End-to-end scene text recognition , 2011, 2011 International Conference on Computer Vision.

[14]  Jonathan Baxter,et al.  Learning internal representations , 1995, COLT '95.

[15]  Graham W. Taylor,et al.  Adaptive deconvolutional networks for mid and high level feature learning , 2011, 2011 International Conference on Computer Vision.

[16]  Vishal M. Patel,et al.  Joint Hierarchical Domain Adaptation and Feature Learning , 2013 .

[17]  Thomas S. Huang,et al.  An Analysis of Unsupervised Pre-training in Light of Recent Advances , 2014, ICLR.

[18]  Tao Wang,et al.  End-to-end text recognition with convolutional neural networks , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[19]  Koby Crammer,et al.  A theory of learning from different domains , 2010, Machine Learning.

[20]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[21]  Graham W. Taylor,et al.  Deconvolutional networks , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[22]  Roland Memisevic,et al.  Zero-bias autoencoders and the benefits of co-adapting features , 2014, ICLR.

[23]  Azriel Rosenfeld,et al.  Face recognition: A literature survey , 2003, CSUR.

[24]  Rajat Raina,et al.  Self-taught learning: transfer learning from unlabeled data , 2007, ICML '07.