Learning Representation and Control in Markov Decision Processes: New Frontiers
暂无分享,去创建一个
[1] Clarence E. Rose,et al. What is tensor analysis? , 1938, Electrical Engineering.
[2] Saul Amarel,et al. On representations of problems of reasoning about actions , 1968 .
[3] P. Schweitzer. Perturbation theory and finite Markov chains , 1968 .
[4] M. Fiedler. Algebraic connectivity of graphs , 1973 .
[5] Jean-Pierre Serre,et al. Linear representations of finite groups , 1977, Graduate texts in mathematics.
[6] C. D. Meyer,et al. Generalized inverses of linear transformations , 1979 .
[7] J. Eells. EIGENVALUES IN RIEMANNIAN GEOMETRY (Pure and Applied Mathematics: A Series of Monographs and Textbooks, 115) , 1985 .
[8] P. Schweitzer,et al. Generalized polynomial approximations in Markovian decision processes , 1985 .
[9] D. Bertsekas,et al. Adaptive aggregation methods for infinite horizon dynamic programming , 1989 .
[10] Stéphane Mallat,et al. A Theory for Multiresolution Signal Decomposition: The Wavelet Representation , 1989, IEEE Trans. Pattern Anal. Mach. Intell..
[11] G. Dunteman. Principal Components Analysis , 1989 .
[12] Devika Subramanian,et al. A Theory of Justified Reformulations , 1989, ML.
[13] G. Wahba. Spline models for observational data , 1990 .
[14] R. Coifman,et al. Fast wavelet transforms and numerical algorithms I , 1991 .
[15] V. N. Bogaevski,et al. Matrix Perturbation Theory , 1991 .
[16] S. Axler,et al. Harmonic Function Theory , 1992 .
[17] Ingrid Daubechies,et al. Ten Lectures on Wavelets , 1992 .
[18] C. Loan. Computational Frameworks for the Fast Fourier Transform , 1992 .
[19] C. Loan,et al. Approximation with Kronecker Products , 1992 .
[20] D. Gurarie. Symmetries and Laplacians: Introduction to Harmonic Analysis, Group Representations and Applications , 1992 .
[21] Sridhar Mahadevan,et al. Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..
[22] J. A. López del Val,et al. Principal Components Analysis , 2018, Applied Univariate, Bivariate, and Multivariate Statistics Using Python.
[23] Peter Dayan,et al. Improving Generalization for Temporal Difference Learning: The Successor Representation , 1993, Neural Computation.
[24] Robert J. Plemmons,et al. Nonnegative Matrices in the Mathematical Sciences , 1979, Classics in Applied Mathematics.
[25] Iain MacLeod,et al. Generalised Matrix Inversion and Rank Computation by Successive Matrix Powering , 1994, Parallel Comput..
[26] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.
[27] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[28] C. D. Meyer. Sensitivity of the Stationary Distribution of a Markov Chain , 1994, SIAM J. Matrix Anal. Appl..
[29] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.
[30] Peter Norvig,et al. Artificial Intelligence: A Modern Approach , 1995 .
[31] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[32] D. Cvetkovic,et al. Spectra of graphs : theory and application , 1995 .
[33] Wei Zhang,et al. A Reinforcement Learning Approach to job-shop Scheduling , 1995, IJCAI.
[34] Anders R. Kristensen,et al. Dynamic programming and Markov decision processes , 1996 .
[35] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[36] Fan Chung,et al. Spectral Graph Theory , 1996 .
[37] S. Rosenberg. The Laplacian on a Riemannian Manifold: The Laplacian on a Riemannian Manifold , 1997 .
[38] Robert Givan,et al. Model Minimization in Markov Decision Processes , 1997, AAAI/IAAI.
[39] Andrew W. Moore,et al. Barycentric Interpolators for Continuous Space and Time Reinforcement Learning , 1998, NIPS.
[40] Benjamin Van Roy. Learning and value function approximation in complex decision processes , 1998 .
[41] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[42] S. Mallat. A wavelet tour of signal processing , 1998 .
[43] Xi-Ren Cao,et al. The Relations Among Potentials, Perturbation Analysis, and Markov Decision Processes , 1998, Discret. Event Dyn. Syst..
[44] Vipin Kumar,et al. A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..
[45] Charles W. Anderson,et al. Using Temporal Neighborhoods to Adapt Function Approximators in Reinforcement Learning , 1999, IWANN.
[46] S. Mahadevan,et al. Solving Semi-Markov Decision Problems Using Average Reward Reinforcement Learning , 1999 .
[47] Justin A. Boyan,et al. Least-Squares Temporal Difference Learning , 1999, ICML.
[48] G. Micula,et al. Numerical Treatment of the Integral Equations , 1999 .
[49] Jesse Hoey,et al. SPUDD: Stochastic Planning using Decision Diagrams , 1999, UAI.
[50] Jianbo Shi,et al. Learning Segmentation by Random Walks , 2000, NIPS.
[51] J. Tenenbaum,et al. A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.
[52] Yimin Wei,et al. Successive matrix squaring algorithm for computing the Drazin inverse , 2000, Appl. Math. Comput..
[53] S T Roweis,et al. Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.
[54] Christopher K. I. Williams,et al. Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.
[55] Jesse Hoey,et al. APRICODD: Approximate Policy Construction Using Decision Diagrams , 2000, NIPS.
[56] F. Deutsch. Best approximation in inner product spaces , 2001 .
[57] P. Diaconis,et al. A geometric interpretation of the Metropolis-Hastings algorithm , 2001 .
[58] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[59] Michael I. Jordan,et al. On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.
[60] M. Eiermann,et al. Geometric aspects of the theory of Krylov subspace methods , 2001, Acta Numerica.
[61] Xin Wang,et al. Batch Value Function Approximation via Support Vectors , 2001, NIPS.
[62] Bernhard Schölkopf,et al. Sampling Techniques for Kernel Methods , 2001, NIPS.
[63] Andrew G. Barto,et al. Autonomous discovery of temporal abstractions from interaction with an environment , 2002 .
[64] P. Chebotarev,et al. Forest Matrices Around the Laplaeian Matrix , 2002, math/0508178.
[65] Paul E. Utgoff,et al. Many-Layered Learning , 2002, Neural Computation.
[66] Chris Drummond,et al. Accelerating Reinforcement Learning by Composing Solutions of Automatically Identified Subtasks , 2011, J. Artif. Intell. Res..
[67] John M. Lee. Introduction to Smooth Manifolds , 2002 .
[68] Craig Boutilier,et al. Greedy linear value-approximation for factored Markov decision processes , 2002, AAAI/IAAI.
[69] C. D. Meyer,et al. Updating the stationary vector of an irreducible Markov chain , 2002 .
[70] Jitendra Malik,et al. Spectral Partitioning with Indefinite Kernels Using the Nyström Extension , 2002, ECCV.
[71] Craig Boutilier,et al. Value-Directed Compression of POMDPs , 2002, NIPS.
[72] Sridhar Mahadevan,et al. Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..
[73] Balaraman Ravindran,et al. SMDP Homomorphisms: An Algebraic Approach to Abstraction in Semi-Markov Decision Processes , 2003, IJCAI.
[74] S. Shankar Sastry,et al. Autonomous Helicopter Flight via Reinforcement Learning , 2003, NIPS.
[75] Shobha Venkataraman,et al. Efficient Solution Algorithms for Factored MDPs , 2003, J. Artif. Intell. Res..
[76] Elias M. Stein,et al. Fourier Analysis: An Introduction , 2003 .
[77] Carl E. Rasmussen,et al. Gaussian Processes in Reinforcement Learning , 2003, NIPS.
[78] Stefan Schaal,et al. Reinforcement Learning for Humanoid Robotics , 2003 .
[79] Rémi Munos,et al. Error Bounds for Approximate Policy Iteration , 2003, ICML.
[80] Jeff G. Schneider,et al. Covariant policy search , 2003, IJCAI 2003.
[81] Shie Mannor,et al. Bayes Meets Bellman: The Gaussian Process Approach to Temporal Difference Learning , 2003, ICML.
[82] Daniel N. Rockmore,et al. Computing Isotypic Projections with the Lanczos Iteration , 2003, SIAM J. Matrix Anal. Appl..
[83] Benjamin Van Roy,et al. The Linear Programming Approach to Approximate Dynamic Programming , 2003, Oper. Res..
[84] Anthony Widjaja,et al. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.
[85] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[86] Yousef Saad,et al. Iterative methods for sparse linear systems , 2003 .
[87] H. Kushner,et al. Stochastic Approximation and Recursive Algorithms and Applications , 2003 .
[88] Alan M. Frieze,et al. Fast monte-carlo algorithms for finding low-rank approximations , 2004, JACM.
[89] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
[90] Shie Mannor,et al. Dynamic abstraction in reinforcement learning via clustering , 2004, ICML.
[91] R. Coifman,et al. Diffusion Wavelets , 2004 .
[92] Geoffrey E. Hinton,et al. Reinforcement Learning with Factored States and Actions , 2004, J. Mach. Learn. Res..
[93] P. Chebotarev,et al. On of the Spectra of Nonsymmetric Laplacian Matrices , 2004, math/0508176.
[94] Peter Dayan,et al. Structure in the Space of Value Functions , 2002, Machine Learning.
[95] Mikhail Belkin,et al. Semi-Supervised Learning on Riemannian Manifolds , 2004, Machine Learning.
[96] Ann B. Lee,et al. Geometric diffusions as a tool for harmonic analysis and structure definition of data: diffusion maps. , 2005, Proceedings of the National Academy of Sciences of the United States of America.
[97] Sridhar Mahadevan,et al. Coarticulation: an approach for generating concurrent plans in Markov decision processes , 2005, ICML.
[98] Petros Drineas,et al. On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning , 2005, J. Mach. Learn. Res..
[99] Ann B. Lee,et al. Geometric diffusions as a tool for harmonic analysis and structure definition of data: multiscale methods. , 2005, Proceedings of the National Academy of Sciences of the United States of America.
[100] Ronald Rosenfeld,et al. Semi-supervised learning with graphs , 2005 .
[101] Sridhar Mahadevan,et al. Representation Policy Iteration , 2005, UAI.
[102] Mauro Maggioni,et al. Geometric diffusions for the analysis of data from sensor networks , 2005, Current Opinion in Neurobiology.
[103] Christian P. Robert,et al. Monte Carlo Statistical Methods (Springer Texts in Statistics) , 2005 .
[104] M. Maggioni,et al. GEOMETRIC DIFFUSIONS AS A TOOL FOR HARMONIC ANALYSIS AND STRUCTURE DEFINITION OF DATA PART I: DIFFUSION MAPS , 2005 .
[105] John D. Lafferty,et al. Diffusion Kernels on Statistical Manifolds , 2005, J. Mach. Learn. Res..
[106] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[107] Sridhar Mahadevan,et al. Value Function Approximation with Diffusion Wavelets and Laplacian Eigenfunctions , 2005, NIPS.
[108] F. Chung. Laplacians and the Cheeger Inequality for Directed Graphs , 2005 .
[109] Sridhar Mahadevan,et al. Proto-value functions: developmental reinforcement learning , 2005, ICML.
[110] Mark Herbster,et al. Online learning over graphs , 2005, ICML.
[111] Liming Xiang,et al. Kernel-Based Reinforcement Learning , 2006, ICIC.
[112] Sridhar Mahadevan,et al. Learning Representation and Control in Continuous Markov Decision Processes , 2006, AAAI.
[113] Sridhar Mahadevan,et al. Fast direct policy evaluation using multiscale analysis of Markov diffusion processes , 2006, ICML.
[114] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.
[115] Steven M. LaValle,et al. Planning algorithms , 2006 .
[116] Milos Hauskrecht,et al. Learning Basis Functions in Hybrid Domains , 2006, AAAI.
[117] S. Mahadevan,et al. Proto-transfer Learning in Markov Decision Processes Using Spectral Methods , 2006 .
[118] Amy Nicole Langville,et al. Updating Markov Chains with an Eye on Google's PageRank , 2005, SIAM J. Matrix Anal. Appl..
[119] Thomas J. Walsh,et al. Towards a Unified Theory of State Abstraction for MDPs , 2006, AI&M.
[120] Nathaniel E. Helwig,et al. An Introduction to Linear Algebra , 2006 .
[121] Arthur D. Szlam,et al. Diffusion wavelet packets , 2006 .
[122] IV JohnS.Caughman,et al. Kernels of Directed Graph Laplacians , 2006, Electron. J. Comb..
[123] G. Swaminathan. Robot Motion Planning , 2006 .
[124] Sridhar Mahadevan,et al. Constructing basis functions from directed graphs for value function approximation , 2007, ICML '07.
[125] Sridhar Mahadevan,et al. Learning state-action basis functions for hierarchical MDPs , 2007, ICML '07.
[126] Ulrike von Luxburg,et al. Graph Laplacians and their Convergence on Random Neighborhood Graphs , 2006, J. Mach. Learn. Res..
[127] Sridhar Mahadevan,et al. Hierarchical Average Reward Reinforcement Learning , 2007, J. Mach. Learn. Res..
[128] Chang Wang,et al. Compact Spectral Bases for Value Function Approximation Using Kronecker Factorization , 2007, AAAI.
[129] Michel Verleysen,et al. Nonlinear Dimensionality Reduction , 2021, Computer Vision.
[130] Abhijit Gosavi,et al. Self-Improving Factory Simulation using Continuous-time Average-Reward Reinforcement Learning , 2007 .
[131] Warren B. Powell,et al. Approximate Dynamic Programming: Solving the Curses of Dimensionality (Wiley Series in Probability and Statistics) , 2007 .
[132] Peter F. Stadler,et al. Laplacian Eigenvectors of Graphs , 2007 .
[133] Marek Petrik,et al. An Analysis of Laplacian Methods for Value Function Approximation in MDPs , 2007, IJCAI.
[134] Lihong Li,et al. Analyzing feature generation for value-function approximation , 2007, ICML '07.
[135] A. Kaveh,et al. Block diagonalization of Laplacian matrices of symmetric graphs via group theory , 2007 .
[136] Sridhar Mahadevan,et al. Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes , 2007, J. Mach. Learn. Res..
[137] M. Maggioni,et al. Universal Local Parametrizations via Heat Kernels and Eigenfunctions of the Laplacian , 2007, 0709.1975.
[138] Jonathan P. How,et al. Approximate dynamic programming using support vector regression , 2008, 2008 47th IEEE Conference on Decision and Control.
[139] Lihong Li,et al. An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning , 2008, ICML '08.
[140] Stphane Mallat,et al. A Wavelet Tour of Signal Processing, Third Edition: The Sparse Way , 2008 .
[141] Sridhar Mahadevan. Fast Spectral Learning using Lanczos Eigenspace Projections , 2008, AAAI.
[142] Sridhar Mahadevan,et al. Representation Discovery using Harmonic Analysis , 2008, Representation Discovery using Harmonic Analysis.
[143] Von-Wun Soo,et al. Graph Laplacian based transfer learning in reinforcement learning , 2008, AAMAS.
[144] S. Mahadevan,et al. Action-based representation discovery in markov decision processes , 2009 .
[145] Panos M. Pardalos,et al. Approximate dynamic programming: solving the curses of dimensionality , 2009, Optim. Methods Softw..
[146] U. Rieder,et al. Markov Decision Processes , 2010 .
[147] Robert H. Halstead,et al. Matrix Computations , 2011, Encyclopedia of Parallel Computing.
[148] Eduardo F. Morales,et al. An Introduction to Reinforcement Learning , 2011 .
[149] Christian Robert. Monte Carlo Methods in Statistics , 2011, International Encyclopedia of Statistical Science.