Deep Learning of Constrained Autoencoders for Enhanced Understanding of Data

Unsupervised feature extractors are known to perform an efficient and discriminative representation of data. Insight into the mappings they perform and human ability to understand them, however, remain very limited. This is especially prominent when multilayer deep learning architectures are used. This paper demonstrates how to remove these bottlenecks within the architecture of non-negativity constrained autoencoder. It is shown that using both L1 and L2 regularizations that induce non-negativity of weights, most of the weights in the network become constrained to be non-negative, thereby resulting into a more understandable structure with minute deterioration in classification accuracy. Also, this proposed approach extracts features that are more sparse and produces additional output layer sparsification. The method is analyzed for accuracy and feature interpretation on the MNIST data, the NORB normalized uniform object data, and the Reuters text categorization data set.

[1]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[2]  Samy Bengio,et al.  Guest Editors' Introduction: Special Section on Learning Deep Architectures , 2013, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Masumi Ishikawa,et al.  Structural learning with forgetting , 1996, Neural Networks.

[4]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[5]  Y. LeCun,et al.  Learning methods for generic object recognition with invariance to pose and lighting , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[6]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[7]  Xiaodong Wang,et al.  Reverse engineering gene regulatory networks from measurement with missing values , 2016, EURASIP J. Bioinform. Syst. Biol..

[8]  Thomas Hofmann,et al.  Greedy Layer-Wise Training of Deep Networks , 2007 .

[9]  Anders Krogh,et al.  A Simple Weight Decay Can Improve Generalization , 1991, NIPS.

[10]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[11]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[12]  Yoshua Bengio,et al.  Scaling learning algorithms towards AI , 2007 .

[13]  Marcello Sanguineti,et al.  Regularization Techniques and Suboptimal Solutions to Optimization Problems in Learning from Data , 2010, Neural Computation.

[14]  Jacek M. Zurada,et al.  Deep Learning of Part-Based Representation of Data Using Sparse Autoencoders With Nonnegativity Constraints , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[15]  Jacek M. Zurada,et al.  Clustering of receptive fields in Autoencoders , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).

[16]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[17]  Jacek M. Zurada,et al.  Artificial Intelligence and Soft Computing , 2014, Lecture Notes in Computer Science.

[18]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[19]  Jorge Nocedal,et al.  A Limited Memory Algorithm for Bound Constrained Optimization , 1995, SIAM J. Sci. Comput..

[20]  Xiaodong Wang,et al.  SeqClone: sequential Monte Carlo based inference of tumor subclones , 2019, BMC Bioinformatics.

[21]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[22]  Stephen J. Wright,et al.  Numerical Optimization , 2018, Fundamental Statistical Inference.

[23]  Xiaodong Wang,et al.  Bayesian estimation of scaled mutation rate under the coalescent: a sequential Monte Carlo approach , 2017, BMC Bioinformatics.

[24]  Ruslan Salakhutdinov,et al.  Importance Weighted Autoencoders , 2015, ICLR.

[25]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[26]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[27]  Jochen J. Steil,et al.  Online learning and generalization of parts-based image representations by non-negative sparse autoencoders , 2012, Neural Networks.

[28]  Peter L. Bartlett,et al.  The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network , 1998, IEEE Trans. Inf. Theory.

[29]  J. Nocedal,et al.  A Limited Memory Algorithm for Bound Constrained Optimization , 1995, SIAM J. Sci. Comput..

[30]  Navdeep Jaitly,et al.  Adversarial Autoencoders , 2015, ArXiv.

[31]  Jacek M. Zurada,et al.  Learning Understandable Neural Networks With Nonnegative Weight Constraints , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[32]  Li Deng,et al.  A tutorial survey of architectures, algorithms, and applications for deep learning , 2014, APSIPA Transactions on Signal and Information Processing.

[33]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[34]  Jacek M. Zurada,et al.  Visualizing and Understanding Nonnegativity Constrained Sparse Autoencoder in Deep Learning , 2016, ICAISC.

[35]  Marc'Aurelio Ranzato,et al.  Sparse Feature Learning for Deep Belief Networks , 2007, NIPS.