Intelligence as Inference or Forcing Occam on the World

We propose to perform the optimization task of Universal Artificial Intelligence (UAI) through learning a reference machine on which good programs are short. Further, we also acknowledge that the choice of reference machine that the UAI objective is based on is arbitrary and, therefore, we learn a suitable machine for the environment we are in. This is based on viewing Occam’s razor as an imperative instead of as a proposition about the world. Since this principle cannot be true for all reference machines, we need to find a machine that makes the principle true. We both want good policies and the environment to have short implementations on the machine. Such a machine is learnt iteratively through a procedure that generalizes the principle underlying the Expectation-Maximization algorithm.

[1]  J. Hawkins,et al.  On Intelligence , 2004 .

[2]  M. Botvinick,et al.  Planning as inference , 2012, Trends in Cognitive Sciences.

[3]  Y. Loewenstein,et al.  Reinforcement learning and human behavior , 2014, Current Opinion in Neurobiology.

[4]  Temple F. Smith Occam's razor , 1980, Nature.

[5]  Jürgen Schmidhuber,et al.  Shifting Inductive Bias with Success-Story Algorithm, Adaptive Levin Search, and Incremental Self-Improvement , 1997, Machine Learning.

[6]  Jürgen Schmidhuber,et al.  Gödel Machines: Fully Self-referential Optimal Universal Self-improvers , 2007, Artificial General Intelligence.

[7]  Marcus Hutter,et al.  Optimistic AIXI , 2012, AGI.

[8]  Henning Sprekeler,et al.  Functional Requirements for Reward-Modulated Spike-Timing-Dependent Plasticity , 2010, The Journal of Neuroscience.

[9]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[10]  Leslie Pack Kaelbling,et al.  Bayesian Policy Search with Policy Priors , 2011, IJCAI.

[11]  N. McGlynn Thinking fast and slow. , 2014, Australian veterinary journal.

[12]  Laurent Orseau,et al.  Space-Time Embedded Intelligence , 2012, AGI.

[13]  Douglas B. Lenat,et al.  The Plausible Mutation of DNA , 1980 .

[14]  Marcus Hutter,et al.  Optimistic Agents Are Asymptotically Optimal , 2012, Australasian Conference on Artificial Intelligence.

[15]  Yonatan Loewenstein,et al.  Operant matching is a generic outcome of synaptic plasticity based on the covariance between reward and neural activity , 2006, Proceedings of the National Academy of Sciences.

[16]  Yoshua Bengio,et al.  Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[17]  Robert A. Legenstein,et al.  Theoretical Analysis of Learning with Reward-Modulated Spike-Timing-Dependent Plasticity , 2007, NIPS.

[18]  P. Gowaty Developmental Plasticity and Evolution Mary Jane West-Eberhard , 2005, Animal Behaviour.

[19]  Dr. Marcus Hutter,et al.  Universal artificial intelligence , 2004 .

[20]  Geoffrey E. Hinton,et al.  Using Expectation-Maximization for Reinforcement Learning , 1997, Neural Computation.

[21]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[22]  Shane Legg,et al.  Universal Intelligence: A Definition of Machine Intelligence , 2007, Minds and Machines.

[23]  R. Herrnstein On the law of effect. , 1970, Journal of the experimental analysis of behavior.

[24]  Marcus Hutter,et al.  Universal Artificial Intellegence - Sequential Decisions Based on Algorithmic Probability , 2005, Texts in Theoretical Computer Science. An EATCS Series.

[25]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[26]  Martin Pelikan Probabilistic model-building genetic algorithms , 2010, GECCO.

[27]  Stuart J. Russell Rationality and Intelligence , 1995, IJCAI.