PCA-Initialized Deep Neural Networks Applied to Document Image Analysis

In this paper, we present a novel approach for initializing deep neural networks, i.e., by using Principal Component Analysis (PCA) to initialize neural layers. Usually, the initialization of the weights of a deep neural network is done in one of the three following ways: 1) with random values, 2) layer-wise, usually as Deep Belief Network or as auto-encoder, and 3) re-use of layers from another network (transfer learning). Therefore, typically, many training epochs are needed before meaningful weights are learned, or a rather similar dataset is required for seeding a fine-tuning of transfer learning. In this paper, we describe how to turn a PCA into an auto-encoder, by generating an encoder layer of the PCA parameters and furthermore adding a decoding layer. We analyze the initialization technique on real documents. First, we show that a PCA-based initialization is quick and leads to a very stable initialization. Furthermore, for the task of layout analysis we investigate the effectiveness of PCAbased initialization and show that it outperforms state-of-the-art random weight initialization methods.

[1]  W S McCulloch,et al.  A logical calculus of the ideas immanent in nervous activity , 1990, The Philosophy of Artificial Intelligence.

[2]  H. Poincaré,et al.  Les méthodes nouvelles de la mécanique céleste , 1899 .

[3]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[4]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[5]  Jürgen Schmidhuber,et al.  Stacked Convolutional Auto-Encoders for Hierarchical Feature Extraction , 2011, ICANN.

[6]  Trevor Darrell,et al.  Data-dependent Initializations of Convolutional Neural Networks , 2015, ICLR.

[7]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[8]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[9]  Jiwen Lu,et al.  PCANet: A Simple Deep Learning Baseline for Image Classification? , 2014, IEEE Transactions on Image Processing.

[10]  E. Oja Simplified neuron model as a principal component analyzer , 1982, Journal of mathematical biology.

[11]  Juhan Nam,et al.  Multimodal Deep Learning , 2011, ICML.

[12]  Angelika Garz,et al.  DIVA-HisDB: A Precisely Annotated Large Dataset of Challenging Medieval Manuscripts , 2016, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[13]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[14]  Marcus Liwicki,et al.  Page segmentation of historical document images with convolutional autoencoders , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[15]  Xiaogang Wang,et al.  Deep Learning Face Representation from Predicting 10,000 Classes , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[17]  Marcus Liwicki,et al.  Ground truth model, tool, and dataset for layout analysis of historical documents , 2015, Electronic Imaging.

[18]  Dana H. Ballard,et al.  Modular Learning in Neural Networks , 1987, AAAI.

[19]  Erkki Oja,et al.  Principal components, minor components, and linear neural networks , 1992, Neural Networks.

[20]  Angelika Garz,et al.  Creating Ground Truth for Historical Manuscripts with Document Graphs and Scribbling Interaction , 2016, 2016 12th IAPR Workshop on Document Analysis Systems (DAS).