Deep Logic Networks: Inserting and Extracting Knowledge From Deep Belief Networks

Developments in deep learning have seen the use of layerwise unsupervised learning combined with supervised learning for fine-tuning. With this layerwise approach, a deep network can be seen as a more modular system that lends itself well to learning representations. In this paper, we investigate whether such modularity can be useful to the insertion of background knowledge into deep networks, whether it can improve learning performance when it is available, and to the extraction of knowledge from trained deep networks, and whether it can offer a better understanding of the representations learned by such networks. To this end, we use a simple symbolic language—a set of logical rules that we call confidence rules—and show that it is suitable for the representation of quantitative reasoning in deep networks. We show by knowledge extraction that confidence rules can offer a low-cost representation for layerwise networks (or restricted Boltzmann machines). We also show that layerwise extraction can produce an improvement in the accuracy of deep belief networks. Furthermore, the proposed symbolic characterization of deep networks provides a novel method for the insertion of prior knowledge and training of deep networks. With the use of this method, a deep neural–symbolic system is proposed and evaluated, with the experimental results indicating that modularity through the use of confidence rules and knowledge insertion can be beneficial to network performance.

[1]  Dov M. Gabbay,et al.  Neural-Symbolic Cognitive Reasoning , 2008, Cognitive Technologies.

[2]  Jude Shavlik,et al.  THE EXTRACTION OF REFINED RULES FROM KNOWLEDGE BASED NEURAL NETWORKS , 1993 .

[3]  Honglak Lee,et al.  Sparse deep belief net model for visual area V2 , 2007, NIPS.

[4]  Bart Baesens,et al.  Recursive Neural Network Rule Extraction for Data With Mixed Attributes , 2008, IEEE Transactions on Neural Networks.

[5]  Yongdong Zhang,et al.  A Highly Parallel Framework for HEVC Coding Unit Partitioning Tree Decision on Many-core Processors , 2014, IEEE Signal Processing Letters.

[6]  L. Bottou From machine learning to machine reasoning , 2011, Machine Learning.

[7]  Luc De Raedt,et al.  Neural-Symbolic Learning and Reasoning: Contributions and Challenges , 2015, AAAI Spring Symposia.

[8]  Léon Bottou,et al.  From machine learning to machine reasoning , 2011, Machine Learning.

[9]  Artur S. d'Avila Garcez,et al.  A Neural-Symbolic Cognitive Agent for Online Learning and Reasoning , 2011, IJCAI.

[10]  Leslie G. Valiant,et al.  Three problems in computer science , 2003, JACM.

[11]  Yoichi Hayashi,et al.  Greedy rule generation from discrete data and its use in neural network rule extraction , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[12]  Yongdong Zhang,et al.  Efficient Parallel Framework for HEVC Motion Estimation on Many-Core Processors , 2014, IEEE Transactions on Circuits and Systems for Video Technology.

[13]  Honglak Lee,et al.  Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[14]  Ruslan Salakhutdinov,et al.  Learning in Markov Random Fields using Tempered Transitions , 2009, NIPS.

[15]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[16]  Matthew Richardson,et al.  Markov logic networks , 2006, Machine Learning.

[17]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001 .

[18]  Paul Smolensky,et al.  Information processing in dynamical systems: foundations of harmony theory , 1986 .

[19]  Thomas Hofmann,et al.  Greedy Layer-Wise Training of Deep Networks , 2007 .

[20]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[21]  Paulo J. G. Lisboa,et al.  Orthogonal search-based rule extraction (OSRE) for trained neural networks: a practical and efficient approach , 2006, IEEE Transactions on Neural Networks.

[22]  Y. Yao,et al.  On Early Stopping in Gradient Descent Learning , 2007 .

[23]  Massimiliano Pontil,et al.  Multi-Task Feature Learning , 2006, NIPS.

[24]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2004 .

[25]  Artur S. d'Avila Garcez,et al.  Learning and Representing Temporal Knowledge in Recurrent Networks , 2011, IEEE Transactions on Neural Networks.

[26]  Ben Taskar,et al.  Introduction to Statistical Relational Learning (Adaptive Computation and Machine Learning) , 2007 .

[27]  Statistical Relational Artificial Intelligence, Papers from the 2013 AAAI Workshop, Bellevue, Washington, USA, July 15, 2013 , 2013, AAAI Workshop: Statistical Relational Artificial Intelligence.

[28]  Jude W. Shavlik,et al.  Learning Symbolic Rules Using Artificial Neural Networks , 1993, ICML.

[29]  Ruslan Salakhutdinov,et al.  Learning Deep Boltzmann Machines using Adaptive MCMC , 2010, ICML.

[30]  Joseph Y. Halpern Reasoning about uncertainty , 2003 .

[31]  Artur S. d'Avila Garcez,et al.  Fat-Fast VG-RAM WNN: A high performance approach , 2016, Neurocomputing.

[32]  Yoichi Hayashi,et al.  Ensemble neural network rule extraction using Re-RX algorithm , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[33]  Yongdong Zhang,et al.  Multi-task deep visual-semantic embedding for video thumbnail selection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Krysia Broda,et al.  Symbolic knowledge extraction from trained neural networks: A sound approach , 2001, Artif. Intell..

[35]  T. Kathirvalavakumar,et al.  Reverse Engineering the Neural Networks for Rule Extraction in Classification Problems , 2011, Neural Processing Letters.

[36]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[37]  Jude W. Shavlik,et al.  Knowledge-Based Artificial Neural Networks , 1994, Artif. Intell..

[38]  Artur S. d'Avila Garcez,et al.  The Connectionist Inductive Learning and Logic Programming System , 1999, Applied Intelligence.

[39]  S. Tran,et al.  Knowledge Extraction from Deep Belief Networks for Images , 2013 .

[40]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[41]  Gadi Pinkas,et al.  Reasoning, Nonmonotonicity and Learning in Connectionist Networks that Capture Propositional Knowledge , 1995, Artif. Intell..