Forgetting and consolidation for incremental and cumulative knowledge acquisition systems

The application of cognitive mechanisms to support knowledge acquisition is, from our point of view, crucial for making the resulting models coherent, efficient, credible, easy to use and understandable. In particular, there are two characteristic features of intelligence that are essential for knowledge development: forgetting and consolidation. Both plays an important role in knowledge bases and learning systems to avoid possible information overflow and redundancy, and in order to preserve and strengthen important or frequently used rules and remove (or forget) useless ones. We present an incremental, long-life view of knowledge acquisition which tries to improve task after task by determining what to keep, what to consolidate and what to forget, overcoming The Stability-Plasticity dilemma. In order to do that, we rate rules by introducing several metrics through the first adaptation, to our knowledge, of the Minimum Message Length (MML) principle to a coverage graph, a hierarchical assessment structure which treats evidence and rules in a unified way. The metrics are not only used to forget some of the worst rules, but also to set a consolidation process to promote those selected rules to the knowledge base, which is also mirrored by a demotion system. We evaluate the framework with a series of tasks in a chess rule learning domain.

[1]  Egon L. van den Broek,et al.  Modeling human color categorization , 2008, Pattern Recognit. Lett..

[2]  C. S. Wallace,et al.  An Information Measure for Classification , 1968, Comput. J..

[3]  Stephen Grossberg,et al.  Adaptive Resonance Theory: How a brain learns to consciously attend, learn, and recognize a changing world , 2013, Neural Networks.

[4]  José Hernández-Orallo,et al.  Incremental Learning of Functional Logic Programs , 2001, FLOPS.

[5]  J. Hernández-Orallo Constructive reinforcement learning , 2000 .

[6]  Larry A. Rendell,et al.  Lessons from Theory Revision Applied to Constructive Induction , 1995, ICML.

[7]  Rich Caruana,et al.  Multitask Learning: A Knowledge-Based Source of Inductive Bias , 1993, ICML.

[8]  Bogdan Raducanu,et al.  Learning to learn: From smart machines to intelligent machines , 2008, Pattern Recognit. Lett..

[9]  J. Hernández-Orallo,et al.  Explanatory and Creative Alternatives to the MDL priciple , 2000 .

[10]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[11]  Robert John Henderson,et al.  Cumulative learning in the lambda calculus , 2013 .

[12]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[13]  Robin M. Murray,et al.  Handbook of Behavioral Neuroscience , 2016 .

[14]  Sumit Gulwani,et al.  Inductive programming meets the real world , 2015, Commun. ACM.

[15]  Witold Pedrycz,et al.  A fuzzy cognitive structure for pattern recognition , 1989, Pattern Recognit. Lett..

[16]  Luc De Raedt,et al.  Inductive Logic Programming: Theory and Methods , 1994, J. Log. Program..

[17]  Paul M. B. Vitányi,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 1993, Graduate Texts in Computer Science.

[18]  Eric Eaton,et al.  ELLA: An Efficient Lifelong Learning Algorithm , 2013, ICML.

[19]  R. Reiter,et al.  Forget It ! , 1994 .

[20]  Robert M. French,et al.  Pseudo-recurrent Connectionist Networks: An Approach to the 'Sensitivity-Stability' Dilemma , 1997, Connect. Sci..

[21]  Norman Y. Foo,et al.  Solving Logic Program Conflict through Strong and Weak Forgettings , 2005, IJCAI.

[22]  Pierre Marquis,et al.  Reasoning under inconsistency: A forgetting-based approach , 2010, Artif. Intell..

[23]  Lokendra Shastri,et al.  Biological Grounding of Recruitment Learning and Vicinal Algorithms in Long-Term Potentiation , 2001, Emergent Neural Computational Architectures Based on Neuroscience.

[24]  Yan Zhang,et al.  Knowledge forgetting: Properties and applications , 2009, Artif. Intell..

[25]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.

[26]  C. S. Wallace,et al.  Statistical and Inductive Inference by Minimum Message Length (Information Science and Statistics) , 2005 .

[27]  Stephen Muggleton,et al.  An Experimental Comparison of Human and Machine Learning Formalisms , 1989, ML.

[28]  Jeff Z. Pan,et al.  Forgetting for knowledge bases in DL-Lite , 2010, Annals of Mathematics and Artificial Intelligence.

[29]  José Hernández-Orallo,et al.  A Knowledge Growth and Consolidation Framework for Lifelong Machine Learning Systems , 2014, 2014 13th International Conference on Machine Learning and Applications.

[30]  Jean-Christophe Nebel,et al.  Common-sense reasoning for human action recognition , 2013, Pattern Recognit. Lett..

[31]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[32]  Kate Revoredo,et al.  Probabilistic First-Order Theory Revision from Examples , 2005, ILP.

[33]  Dong Sun Park,et al.  Online sequential extreme learning machine with forgetting mechanism , 2012, Neurocomputing.

[34]  Jonathan Baxter,et al.  A Model of Inductive Bias Learning , 2000, J. Artif. Intell. Res..

[35]  Mark A. Pitt,et al.  Advances in Minimum Description Length: Theory and Applications , 2005 .

[36]  Kewen Wang,et al.  Semantic forgetting in answer set programming , 2008, Artif. Intell..

[37]  Stephen Muggleton,et al.  Inverse entailment and progol , 1995, New Generation Computing.

[38]  Esra Erdem,et al.  Forgetting Actions in Domain Descriptions , 2007, AAAI.

[39]  Anthony V. Robins,et al.  Catastrophic Forgetting, Rehearsal and Pseudorehearsal , 1995, Connect. Sci..

[40]  Estevam R. Hruschka,et al.  Toward an Architecture for Never-Ending Language Learning , 2010, AAAI.

[41]  R. Quiroga Concept cells: the building blocks of declarative memory functions , 2012, Nature Reviews Neuroscience.

[42]  Yi Zhou,et al.  Forgetting in Logic Programs under Strong Equivalence , 2012, KR.

[43]  Shlomo Moran,et al.  The stochastic approach for link-structure analysis (SALSA) and the TKC effect , 2000, Comput. Networks.

[44]  José Hernández-Orallo,et al.  Learning with Configurable Operators and RL-Based Heuristics , 2012, NFMCP.

[45]  Stephen Muggleton,et al.  Scientific knowledge discovery using inductive logic programming , 1999, Commun. ACM.

[46]  Pierre Flener,et al.  An introduction to inductive programming , 2008, Artificial Intelligence Review.

[47]  Michael J. Pazzani,et al.  Knowledge discovery from data? , 2000, IEEE Intell. Syst..

[48]  L’oubli catastrophique it,et al.  Avoiding catastrophic forgetting by coupling two reverberating neural networks , 2004 .

[49]  Yongmei Liu,et al.  On the Progression of Knowledge in the Situation Calculus , 2011, IJCAI.

[50]  Jorma Rissanen,et al.  Hypothesis Selection and Testing by the MDL Principle , 1999, Comput. J..

[51]  Kurosh Madani,et al.  From visual patterns to semantic description: A cognitive approach using artificial curiosity as the foundation , 2013, Pattern Recognit. Lett..

[52]  Stephen Grossberg,et al.  The ART of Adaptive Pattern Recognition Self-organizing by a Neu Network , 1988 .

[53]  Yan Zhang,et al.  Variable Forgetting in Reasoning about Knowledge , 2009, J. Artif. Intell. Res..

[54]  João Gama,et al.  A survey on concept drift adaptation , 2014, ACM Comput. Surv..

[55]  David L. Dowe,et al.  Refinements of MDL and MML Coding , 1999, Comput. J..

[56]  Sebastian Thrun,et al.  Is Learning The n-th Thing Any Easier Than Learning The First? , 1995, NIPS.