Dimensionality Reduction via Program Induction

This work proposes a machine learning algorithm for inductive synthesis of programs. The objective of the algorithm is to learn a distribution over programs that evaluate to a given constant. The algorithm searches for programs in a bottomup fashion, going directly from the data to the program, in contrast to recent, similar work that adopts a generate-and-test approach. I use this algorithm to perform density estimation upon strings. Then, I show that the estimated density can be used to perform two distinct kinds of dimensionality reduction upon strings: one, converting strings in to real-valued vectors; and two, converting strings to compressive symbolic descriptions. The goals of this work are to demonstrate a bottom-up approach to program induction, and to evaluate two types of dimensionality reduction suggested by this bottom-up approach.

[1]  Yarden Katz,et al.  Modeling Semantic Cognition as Logical Dimensionality Reduction , 2008 .

[2]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[3]  Stephen Muggleton,et al.  Machine Invention of First Order Predicates by Inverting Resolution , 1988, ML.

[4]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[5]  Ian H. Witten,et al.  Identifying Hierarchical Structure in Sequences: A linear-time algorithm , 1997, J. Artif. Intell. Res..

[6]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.

[7]  Kenichi Kurihara,et al.  Variational Bayesian Grammar Induction for Natural Language , 2006, ICGI.

[8]  Joshua B. Tenenbaum,et al.  Church: a language for generative models , 2008, UAI.

[9]  Michael I. Jordan,et al.  Learning Programs: A Hierarchical Bayesian Approach , 2010, ICML.

[10]  J. D. Lafferty A derivation of the Inside-Outside algorithm from the EM algorithm , 1993 .

[11]  Timothy O'Donnell,et al.  Productivity and Reuse in Language: A Theory of Linguistic Computation and Storage , 2015 .

[12]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[13]  Benjamin C. Pierce,et al.  Types and programming languages: the next generation , 2003, 18th Annual IEEE Symposium of Logic in Computer Science, 2003. Proceedings..

[14]  Riccardo Poli,et al.  A Field Guide to Genetic Programming , 2008 .

[15]  Joshua B. Tenenbaum,et al.  Bootstrap Learning via Modular Concept Discovery , 2013, IJCAI.

[16]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[17]  L. Fenson,et al.  Lexical development norms for young children , 1996 .

[18]  Clifford Stein,et al.  Introduction to Algorithms, 2nd edition. , 2001 .

[19]  Ryan P. Adams,et al.  Learning Graphical Concepts , 2013 .