Bootstrap Learning via Modular Concept Discovery

Suppose a learner is faced with a domain of problems about which it knows nearly nothing. It does not know the distribution of problems, the space of solutions is not smooth, and the reward signal is uninformative, providing perhaps a few bits of information but not enough to steer the learner effectively. How can such a learner ever get off the ground? A common intuition is that if the solutions to these problems share a common structure, and the learner can solve some simple problems by brute force, it should be able to extract useful components from these solutions and, by composing them, explore the solution space more efficiently. Here, we formalize this intuition, where the solution space is that of typed functional programs and the gained information is stored as a stochastic grammar over programs. We propose an iterative procedure for exploring such spaces: in the first step of each iteration, the learner explores a finite subset of the domain, guided by a stochastic grammar; in the second step, the learner compresses the successful solutions from the first step to estimate a new stochastic grammar. We test this procedure on symbolic regression and Boolean circuit learning and show that the learner discovers modular concepts for these domains. Whereas the learner is able to solve almost none of the posed problems in the procedure's first iteration, it rapidly becomes able to solve a large number by gaining abstract knowledge of the structure of the solution space.

[1]  J. Davenport Editor , 1960 .

[2]  S. Reder,et al.  Grammatical complexity and inference , 1969 .

[3]  Herbert A. Simon,et al.  The Sciences of the Artificial , 1970 .

[4]  Patrick A. V. Hall,et al.  Equivalence between AND/OR graphs and context-free grammars , 1973, Commun. ACM.

[5]  Nils J. Nilsson,et al.  Principles of Artificial Intelligence , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Larry A. Rendell,et al.  Substantial Constructive Induction Using Layered Information Compression: Tractable Feature Formation in Search , 1985, IJCAI.

[7]  Editors , 1986, Brain Research Bulletin.

[8]  Stephen Muggleton,et al.  Duce, An Oracle-based Approach to Constructive Induction , 1987, IJCAI.

[9]  J. Holland Complex adaptive systems , 1992 .

[10]  John R. Koza Hierarchical Automatic Function Definition in Genetic Programming , 1992, FOGA.

[11]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[12]  John R. Koza,et al.  Reuse, Parameterized Reuse, and Hierarchical Reuse of Substructures in Evolving Electrical Circuits Using Genetic Programming , 1996, ICES.

[13]  Ian H. Witten,et al.  Identifying Hierarchical Structure in Sequences: A linear-time algorithm , 1997, J. Artif. Intell. Res..

[14]  Stephen Muggleton,et al.  Repeat Learning Using Predicate Invention , 1998, ILP.

[15]  Axthonv G. Oettinger,et al.  IEEE Transactions on Information Theory , 1998 .

[16]  Jorma Rissanen,et al.  The Minimum Description Length Principle in Coding and Modeling , 1998, IEEE Trans. Inf. Theory.

[17]  Benjamin C. Pierce,et al.  Types and programming languages: the next generation , 2003, 18th Annual IEEE Symposium of Logic in Computer Science, 2003. Proceedings..

[18]  Susumu Katayama Systematic search for lambda expressions , 2005, Trends in Functional Programming.

[19]  Forrest Briggs,et al.  Functional Genetic Programming with Combinators , 2006 .

[20]  Jason Brownlee,et al.  Complex adaptive systems , 2007 .

[21]  Johan Jeuring,et al.  Enumerating Well-Typed Terms Generically , 2009, AAIP.

[22]  Michael I. Jordan,et al.  Learning Programs: A Hierarchical Bayesian Approach , 2010, ICML.