论文信息 - Representation Discovery in Planning using Harmonic Analysis

Representation Discovery in Planning using Harmonic Analysis

This paper summarizes ongoing research on a framework for representation learning using harmonic analysis, a subfield of mathematics. Harmonic analysis includes Fourier analysis, where new eigenvector representations are constructed by diagonalization of operators, and wavelet analysis, where new representations are constructed by dilation. The approach is presented specifically in the context of Markov decision processes (MDPs), a widely studied model of planning under uncertainty, although the approach is applicable more broadly to other areas of AI as well. This paper describes a novel harmonic analysis framework for planning based on estimating a diffusion model that models flow of information on a graph (discrete state space) or a manifold (continuous state space) using a discrete form of the Laplace heat equation. Two methods for constructing novel plan representations from diffusion models are described: Fourier methods diagonalize a symmetric diffusion operator called the Laplacian; wavelet methods dilate unit basis functions progressively using powers of the diffusion operator. A new planning framework called Representation Policy Iteration (RPI) is described consisting of an outer loop that estimates new basis functions by diagonalization or dilation, and an inner loop that learns the best policy representable within the linear span of the current basis functions. We demonstrate the flexibility of the approach, which allows basis functions to be adapted to a particular task or reward function, and the hierarchical temporally extended nature of actions. Motivation The ability to learn and modify representations has long been considered a cornerstone of intelligence. The challenge of representation learning has been studied by researchers across a wide variety of subfields in AI and cognitive science, from computer vision (Marr 1982) to problem solving (Amarel 1968). In this paper, we present our ongoing research on a general framework for representation discovery that builds on recent work in harmonic analysis, a subfield of mathematics that includes traditional Fourier and ∗This research was supported in part by the National Science Foundation under grant NSF IIS-0534999. Copyright c © 2007, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. wavelet methods (Mallat 1998). Recent work in harmonic analysis has extended the scope of these traditional analytic tools studied in Euclidean spaces to more general discrete spaces (graphs) and continuous spaces (manifolds). For example, spectral graph theory (Chung 1997) studies the properties of the Laplacian, whose eigenvectors can be viewed as a Fourier basis on graphs. Even more recently, research in computational harmonic analysis has extended multiresolution wavelet methods to graphs (Coifman & Maggioni 2006). We build on these advances in harmonic analysis by showing how agents embedded in stochastic environments can learn novel representations for planning. We then subsequently modify or augment these representations to take into account rewards or the nature of actions. Our approach is actually more broadly applicable to a wide variety of other areas in AI, including perception, information extraction, and robotics, but we confine our presentation to planning under uncertainty. The framework provides general ways of constructing multiscale representations of stochastic processes such as Markov chains and Markov decision processes. Our approach builds on recent work in machine learning that focuses on modeling the nonlinear geometry of the space underlying many real-world datasets. Nonlinear dimensionality reduction methods have recently emerged that empirically model and recover the underlying manifold, for example multidimensional scaling (Borg & Groenen 1996), LLE (Roweis & Saul 2000), ISOMAP (Tenenbaum, de Silva, & Langford 2000), Laplacian eigenmaps (Belkin & Niyogi 2003), and diffusion maps (Coifman et al. 2005). These techniques can be significantly more powerful than wellstudied linear Euclidean subspace methods such as principal components analysis (Jolliffe 1986). Markov Decision Processes and Function Approximation A Markov decision process M = 〈S, A, P a ss , R a ss , γ〉 is defined by a set of states S, a set of actions A, a transition model P a ss which specifies the probability distribution over future states s when performing action a in state s, a reward model R ss specifying a scalar reward, and a discount factor γ ∈ [0, 1) (Puterman 1994). The discount factor balances the agent’s desire for either immediate (small γ) or future