Representation Discovery in Planning using Harmonic Analysis

This paper summarizes ongoing research on a framework for representation learning using harmonic analysis, a subfield of mathematics. Harmonic analysis includes Fourier analysis, where new eigenvector representations are constructed by diagonalization of operators, and wavelet analysis, where new representations are constructed by dilation. The approach is presented specifically in the context of Markov decision processes (MDPs), a widely studied model of planning under uncertainty, although the approach is applicable more broadly to other areas of AI as well. This paper describes a novel harmonic analysis framework for planning based on estimating a diffusion model that models flow of information on a graph (discrete state space) or a manifold (continuous state space) using a discrete form of the Laplace heat equation. Two methods for constructing novel plan representations from diffusion models are described: Fourier methods diagonalize a symmetric diffusion operator called the Laplacian; wavelet methods dilate unit basis functions progressively using powers of the diffusion operator. A new planning framework called Representation Policy Iteration (RPI) is described consisting of an outer loop that estimates new basis functions by diagonalization or dilation, and an inner loop that learns the best policy representable within the linear span of the current basis functions. We demonstrate the flexibility of the approach, which allows basis functions to be adapted to a particular task or reward function, and the hierarchical temporally extended nature of actions. Motivation The ability to learn and modify representations has long been considered a cornerstone of intelligence. The challenge of representation learning has been studied by researchers across a wide variety of subfields in AI and cognitive science, from computer vision (Marr 1982) to problem solving (Amarel 1968). In this paper, we present our ongoing research on a general framework for representation discovery that builds on recent work in harmonic analysis, a subfield of mathematics that includes traditional Fourier and ∗This research was supported in part by the National Science Foundation under grant NSF IIS-0534999. Copyright c © 2007, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. wavelet methods (Mallat 1998). Recent work in harmonic analysis has extended the scope of these traditional analytic tools studied in Euclidean spaces to more general discrete spaces (graphs) and continuous spaces (manifolds). For example, spectral graph theory (Chung 1997) studies the properties of the Laplacian, whose eigenvectors can be viewed as a Fourier basis on graphs. Even more recently, research in computational harmonic analysis has extended multiresolution wavelet methods to graphs (Coifman & Maggioni 2006). We build on these advances in harmonic analysis by showing how agents embedded in stochastic environments can learn novel representations for planning. We then subsequently modify or augment these representations to take into account rewards or the nature of actions. Our approach is actually more broadly applicable to a wide variety of other areas in AI, including perception, information extraction, and robotics, but we confine our presentation to planning under uncertainty. The framework provides general ways of constructing multiscale representations of stochastic processes such as Markov chains and Markov decision processes. Our approach builds on recent work in machine learning that focuses on modeling the nonlinear geometry of the space underlying many real-world datasets. Nonlinear dimensionality reduction methods have recently emerged that empirically model and recover the underlying manifold, for example multidimensional scaling (Borg & Groenen 1996), LLE (Roweis & Saul 2000), ISOMAP (Tenenbaum, de Silva, & Langford 2000), Laplacian eigenmaps (Belkin & Niyogi 2003), and diffusion maps (Coifman et al. 2005). These techniques can be significantly more powerful than wellstudied linear Euclidean subspace methods such as principal components analysis (Jolliffe 1986). Markov Decision Processes and Function Approximation A Markov decision process M = 〈S, A, P a ss , R a ss , γ〉 is defined by a set of states S, a set of actions A, a transition model P a ss which specifies the probability distribution over future states s when performing action a in state s, a reward model R ss specifying a scalar reward, and a discount factor γ ∈ [0, 1) (Puterman 1994). The discount factor balances the agent’s desire for either immediate (small γ) or future

[1]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[2]  I. T. Jolliffe,et al.  Generalizations and Adaptations of Principal Component Analysis , 1986 .

[3]  Alicia P. Wolfe,et al.  Identifying useful subgoals in reinforcement learning by local graph partitioning , 2005, ICML.

[4]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[5]  Petros Drineas,et al.  On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning , 2005, J. Mach. Learn. Res..

[6]  Sridhar Mahadevan,et al.  Representation Policy Iteration , 2005, UAI.

[7]  Andrew G. Barto,et al.  Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density , 2001, ICML.

[8]  Sridhar Mahadevan,et al.  Learning Representation and Control in Continuous Markov Decision Processes , 2006, AAAI.

[9]  M. Maggioni,et al.  GEOMETRIC DIFFUSIONS AS A TOOL FOR HARMONIC ANALYSIS AND STRUCTURE DEFINITION OF DATA PART I: DIFFUSION MAPS , 2005 .

[10]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[11]  Arthur D. Szlam,et al.  Diffusion wavelet packets , 2006 .

[12]  Sridhar Mahadevan,et al.  Learning state-action basis functions for hierarchical MDPs , 2007, ICML '07.

[13]  Lihong Li,et al.  Analyzing feature generation for value-function approximation , 2007, ICML '07.

[14]  R. Coifman,et al.  A general framework for adaptive regularization based on diffusion processes on graphs , 2006 .

[15]  S. Mallat A wavelet tour of signal processing , 1998 .

[16]  Andrew W. Moore,et al.  Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.

[17]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[18]  Sridhar Mahadevan,et al.  Fast direct policy evaluation using multiscale analysis of Markov diffusion processes , 2006, ICML.

[19]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[20]  Shie Mannor,et al.  Automatic basis function construction for approximate dynamic programming and reinforcement learning , 2006, ICML.

[21]  Sridhar Mahadevan,et al.  Constructing basis functions from directed graphs for value function approximation , 2007, ICML '07.

[22]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[23]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[24]  Michail G. Lagoudakis,et al.  Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[25]  Shie Mannor,et al.  Q-Cut - Dynamic Discovery of Sub-goals in Reinforcement Learning , 2002, ECML.

[26]  R. Taylor,et al.  The Numerical Treatment of Integral Equations , 1978 .

[27]  P. Groenen,et al.  Modern Multidimensional Scaling: Theory and Applications , 1999 .

[28]  Richard S. Sutton,et al.  Dimensions of Reinforcement Learning , 1998 .

[29]  Marek Petrik,et al.  An Analysis of Laplacian Methods for Value Function Approximation in MDPs , 2007, IJCAI.

[30]  W. Smart,et al.  Manifold Representations for Value-Function Approximation , 2004 .

[31]  Fan Chung,et al.  Spectral Graph Theory , 1996 .

[32]  Sridhar Mahadevan,et al.  Value Function Approximation with Diffusion Wavelets and Laplacian Eigenfunctions , 2005, NIPS.

[33]  R. Coifman,et al.  Diffusion Wavelets , 2004 .

[34]  R. Weale Vision. A Computational Investigation Into the Human Representation and Processing of Visual Information. David Marr , 1983 .

[35]  Saul Amarel,et al.  On representations of problems of reasoning about actions , 1968 .

[36]  李幼升,et al.  Ph , 1989 .

[37]  S. Rosenberg The Laplacian on a Riemannian Manifold: The Laplacian on a Riemannian Manifold , 1997 .

[38]  Ann B. Lee,et al.  Geometric diffusions as a tool for harmonic analysis and structure definition of data: diffusion maps. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[39]  Shie Mannor,et al.  Basis Function Adaptation in Temporal Difference Reinforcement Learning , 2005, Ann. Oper. Res..

[40]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[41]  G. Dunteman Principal Components Analysis , 1989 .