论文信息 - PCA-Initialized Deep Neural Networks Applied to Document Image Analysis

PCA-Initialized Deep Neural Networks Applied to Document Image Analysis

In this paper, we present a novel approach for initializing deep neural networks, i.e., by using Principal Component Analysis (PCA) to initialize neural layers. Usually, the initialization of the weights of a deep neural network is done in one of the three following ways: 1) with random values, 2) layer-wise, usually as Deep Belief Network or as auto-encoder, and 3) re-use of layers from another network (transfer learning). Therefore, typically, many training epochs are needed before meaningful weights are learned, or a rather similar dataset is required for seeding a fine-tuning of transfer learning. In this paper, we describe how to turn a PCA into an auto-encoder, by generating an encoder layer of the PCA parameters and furthermore adding a decoding layer. We analyze the initialization technique on real documents. First, we show that a PCA-based initialization is quick and leads to a very stable initialization. Furthermore, for the task of layout analysis we investigate the effectiveness of PCAbased initialization and show that it outperforms state-of-the-art random weight initialization methods.

[1] W S McCulloch,et al. A logical calculus of the ideas immanent in nervous activity , 1990, The Philosophy of Artificial Intelligence.

[2] H. Poincaré,et al. Les méthodes nouvelles de la mécanique céleste , 1899 .

[3] Jürgen Schmidhuber,et al. Deep learning in neural networks: An overview , 2014, Neural Networks.

[4] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[5] Jürgen Schmidhuber,et al. Stacked Convolutional Auto-Encoders for Hierarchical Feature Extraction , 2011, ICANN.

[6] Trevor Darrell,et al. Data-dependent Initializations of Convolutional Neural Networks , 2015, ICLR.

[7] James L. McClelland,et al. Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[8] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[9] Jiwen Lu,et al. PCANet: A Simple Deep Learning Baseline for Image Classification? , 2014, IEEE Transactions on Image Processing.

[10] E. Oja. Simplified neuron model as a principal component analyzer , 1982, Journal of mathematical biology.

[11] Juhan Nam,et al. Multimodal Deep Learning , 2011, ICML.

[12] Angelika Garz,et al. DIVA-HisDB: A Precisely Annotated Large Dataset of Challenging Medieval Manuscripts , 2016, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[13] Yee Whye Teh,et al. A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[14] Marcus Liwicki,et al. Page segmentation of historical document images with convolutional autoencoders , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[15] Xiaogang Wang,et al. Deep Learning Face Representation from Predicting 10,000 Classes , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[16] Rich Caruana,et al. Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[17] Marcus Liwicki,et al. Ground truth model, tool, and dataset for layout analysis of historical documents , 2015, Electronic Imaging.

[18] Dana H. Ballard,et al. Modular Learning in Neural Networks , 1987, AAAI.

[19] Erkki Oja,et al. Principal components, minor components, and linear neural networks , 1992, Neural Networks.

[20] Angelika Garz,et al. Creating Ground Truth for Historical Manuscripts with Document Graphs and Scribbling Interaction , 2016, 2016 12th IAPR Workshop on Document Analysis Systems (DAS).