A fast and efficient pre-training method based on layer-by-layer maximum discrimination for deep neural networks

In this paper, through extension of the present methods and based on error minimization, two fast and efficient layer-by-layer pre-training methods are proposed for initializing deep neural network (DNN) weights. Due to confrontation with a large number of local minima, DNN training often does not converge. By proper initializing of DNN weights instead of random values at the beginning of the training, it is possible to avoid many local minima. The first version of the proposed method is for pre-training the deep bottleneck neural network (DBNN) in which the DBNN is broken down to some corresponding single-hidden-layer bottleneck neural networks (BNN) which must be trained first. The weight values resulting from their training are then applied in the DBNN. The proposed method was utilized to pre-train a five-hidden-layer DBNN to extract the non-linear principal components of face images in the Bosphorus database. A comparison of the randomly initialized DBNN result with pre-trained DBNN by the layer-by-layer pre-training method shows that this method not only increased the convergence rate of training but also improved its generalizability. Furthermore, it has been shown that this method yields higher efficiency and convergence speed in comparison with some of the previous pre-training methods. This paper also presents the bidirectional version of the layer-by-layer pre-training method for hetero-associative DNN pre-training. This method pre-trains DNN weights in forward and backward manner in parallel. Bidirectional layer-by-layer pre-training was utilized to pre-train the classifier DNN weights, and revealed that both the training speed and the recognition rate were improved in Bosphorus and CK+ databases.

[1]  Parvin Zarei Eskikand,et al.  Robust speech recognition by extracting invariant features , 2012 .

[2]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[3]  Yoshua Bengio,et al.  Evolving Culture vs Local Minima , 2012, ArXiv.

[4]  Yoshua Bengio,et al.  Scaling learning algorithms towards AI , 2007 .

[5]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[6]  Seyyed Ali Seyyedsalehi,et al.  Simultaneous Learning of Nonlinear Manifolds Based on the Bottleneck Neural Network , 2013, Neural Processing Letters.

[7]  Geoffrey E. Hinton,et al.  Modeling image patches with a directed hierarchy of Markov random fields , 2007, NIPS.

[8]  Thomas Serre,et al.  A neuromorphic approach to computer vision , 2010, Commun. ACM.

[9]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[10]  Fatemeh Abdolali,et al.  Improving pose manifold and virtual images using bidirectional neural networks in face recognition using single image per person , 2011, 2011 International Symposium on Artificial Intelligence and Signal Processing (AISP).

[11]  Pascal Vincent,et al.  The Difficulty of Training Deep Architectures and the Effect of Unsupervised Pre-Training , 2009, AISTATS.

[12]  R. Zafarani,et al.  A New Bidirectional Neural Network for Lexical Modeling and Speech Recognition Improvement , 2007 .

[13]  David D. Cox,et al.  A High-Throughput Screening Approach to Discovering Good Forms of Biologically Inspired Visual Representation , 2009, PLoS Comput. Biol..

[14]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[15]  Thomas Serre,et al.  A quantitative theory of immediate visual recognition. , 2007, Progress in brain research.

[16]  T. Poggio,et al.  A synaptic mechanism possibly underlying directional selectivity to motion , 1978, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[17]  Seyyed Ali Seyyedsalehi,et al.  Towards designing modular recurrent neural networks in learning protein secondary structures , 2012, Expert Syst. Appl..

[18]  Shahriar Gharibzadeh,et al.  A novel neural-based model for acoustic-articulatory inversion mapping , 2011, Neural Computing and Applications.

[19]  Arman Savran,et al.  Bosphorus Database for 3D Face Analysis , 2008, BIOID.

[20]  N. Sadati,et al.  Voice conversion using nonlinear principal component analysis , 2007, 2007 IEEE Symposium on Computational Intelligence in Image and Signal Processing.

[21]  Quoc V. Le,et al.  ICA with Reconstruction Cost for Efficient Overcomplete Feature Learning , 2011, NIPS.

[22]  Yoshua Bengio,et al.  Understanding deep architectures and the effect of unsupervised pre-training , 2011 .

[23]  Takeo Kanade,et al.  The Extended Cohn-Kanade Dataset (CK+): A complete dataset for action unit and emotion-specified expression , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[24]  Aryan Salmanpour,et al.  A comparison between linear and nonlinear principal component analysis using neural networks and a novel technique for face recognition , 2005 .

[25]  Geoffrey E. Hinton,et al.  Learning a Nonlinear Embedding by Preserving Class Neighbourhood Structure , 2007, AISTATS.

[26]  Mona Noori Hosseini,et al.  Unaligned training for voice conversion based on a local nonlinear principal component analysis approach , 2009, Neural Computing and Applications.

[27]  Seyyed Ali Seyyedsalehi,et al.  Pruning neural networks for protein secondary structure prediction , 2008, 2008 8th IEEE International Conference on BioInformatics and BioEngineering.

[28]  Seyyed Ali Seyyedsalehi,et al.  Nonlinear normalization of input patterns to speaker variability in speech recognition neural networks , 2009, Neural Computing and Applications.

[29]  Marc'Aurelio Ranzato,et al.  Building high-level features using large scale unsupervised learning , 2011, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[30]  M. Kramer Nonlinear principal component analysis using autoassociative neural networks , 1991 .

[31]  Laurene V. Fausett,et al.  Fundamentals Of Neural Networks , 1994 .

[32]  D L Massart,et al.  A journey into low-dimensional spaces with autoassociative neural networks. , 2003, Talanta.

[33]  Seyyed Ali Seyyed Salehi,et al.  Robust speech recognition by modifying clean and telephone feature vectors using bidirectional neural network , 2006, INTERSPEECH.

[34]  Yoshua Bengio,et al.  Why Does Unsupervised Pre-training Help Deep Learning? , 2010, AISTATS.

[35]  Seyyed Ali Seyyedsalehi,et al.  Protein secondary structure prediction using modular reciprocal bidirectional recurrent neural networks , 2010, Comput. Methods Programs Biomed..

[36]  Seyyed Ali Seyyedsalehi,et al.  Improving face recognition from a single image per person via virtual images produced by a bidirectional network , 2012 .

[37]  Seyyed Ali Seyyedsalehi,et al.  Improving face recognition from a single image per person via virtual images produced by imagination using neural networks , 2011 .