Simulation-based Uniform Value Function Estimates of Markov Decision Processes
暂无分享,去创建一个
[1] Eitan Altman,et al. Rate of Convergence of Empirical Measures and Costs in Controlled Markov Chains and Transient Optimality , 1994, Math. Oper. Res..
[2] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[3] Persi Diaconis,et al. Iterated Random Functions , 1999, SIAM Rev..
[4] David Gamarnik. Extension of the PAC framework to finite and countable Markov chains , 2003, IEEE Trans. Inf. Theory.
[5] M. Ledoux. The concentration of measure phenomenon , 2001 .
[6] Shaler Stidham,et al. Optimal Control of Markov Chains , 2000 .
[7] V. Vapnik,et al. Necessary and Sufficient Conditions for the Uniform Convergence of Means to their Expectations , 1982 .
[8] Martin Pesendorfer,et al. Identification and Estimation of Dynamic Games , 2003 .
[9] H. Chernoff. A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations , 1952 .
[10] Vladimir Vapnik,et al. Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .
[11] V. Borkar. Topics in controlled Markov chains , 1991 .
[12] Mathukumalli Vidyasagar,et al. System identification: a learning theory approach , 2001, Proceedings of the 40th IEEE Conference on Decision and Control (Cat. No.01CH37228).
[13] P. Massart. Some applications of concentration inequalities to statistics , 2000 .
[14] Mathukumalli Vidyasagar,et al. Learning And Generalization , 2002 .
[15] M. Talagrand. Concentration of measure and isoperimetric inequalities in product spaces , 1994, math/9406212.
[16] A. Dembo,et al. A note on uniform laws of averages for dependent processes , 1993 .
[17] László Györfi,et al. A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.
[18] D. Pollard. Empirical Processes: Theory and Applications , 1990 .
[19] D. Pollard. Convergence of stochastic processes , 1984 .
[20] Kazuoki Azuma. WEIGHTED SUMS OF CERTAIN DEPENDENT RANDOM VARIABLES , 1967 .
[21] V. Bentkus. On Hoeffding’s inequalities , 2004, math/0410159.
[22] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[23] J. Steele. Probability theory and combinatorial optimization , 1987 .
[24] Umesh V. Vazirani,et al. A Markovian extension of Valiant's learning model , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.
[25] Peter L. Bartlett,et al. Neural Network Learning - Theoretical Foundations , 1999 .
[26] John N. Tsitsiklis,et al. On the Empirical State-Action Frequencies in Markov Decision Processes Under General Policies , 2005, Math. Oper. Res..
[27] P. Doukhan. Mixing: Properties and Examples , 1994 .
[28] Noga Alon,et al. Scale-sensitive dimensions, uniform convergence, and learnability , 1997, JACM.
[29] Bin Yu. RATES OF CONVERGENCE FOR EMPIRICAL PROCESSES OF STATIONARY MIXING SEQUENCES , 1994 .
[30] Paul-Marie Samson,et al. Concentration of measure inequalities for Markov chains and $\Phi$-mixing processes , 2000 .
[31] John N. Tsitsiklis,et al. The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..
[32] Sean P. Meyn,et al. Relative entropy and exponential deviation bounds for general Markov chains , 2005, Proceedings. International Symposium on Information Theory, 2005. ISIT 2005..
[33] S. Shankar Sastry,et al. Decentralized nonlinear model predictive control of multiple flying robots , 2003, 42nd IEEE International Conference on Decision and Control (IEEE Cat. No.03CH37475).
[34] John N. Tsitsiklis,et al. Approximate Gradient Methods in Policy-Space Optimization of Markov Reward Processes , 2003, Discret. Event Dyn. Syst..
[35] Yishay Mansour,et al. Approximate Planning in Large POMDPs via Reusable Trajectories , 1999, NIPS.
[36] J. K. Hunter,et al. Measure Theory , 2007 .
[37] Leonid Peshkin,et al. Bounds on Sample Size for Policy Evaluation in Markov Environments , 2001, COLT/EuroCOLT.
[38] Mathukumalli Vidyasagar,et al. Learning and Generalization: With Applications to Neural Networks , 2002 .
[39] Leslie G. Valiant,et al. A theory of the learnable , 1984, CACM.
[40] A. W. van der Vaart,et al. Uniform Central Limit Theorems , 2001 .
[41] A. Kolmogorov,et al. Entropy and "-capacity of sets in func-tional spaces , 1961 .
[42] M. Talagrand. A new look at independence , 1996 .
[43] P. Kumar,et al. Learning dynamical systems in a stationary environment , 1998 .
[44] Jon A. Wellner,et al. Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .
[45] Michael I. Jordan,et al. PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.
[46] W. Hoeffding. Probability Inequalities for sums of Bounded Random Variables , 1963 .
[47] Colin McDiarmid,et al. Surveys in Combinatorics, 1989: On the method of bounded differences , 1989 .
[48] S. Geer. On Hoeffding's Inequality for Dependent Random Variables , 2002 .
[49] Alon Itai,et al. Learnability with Respect to Fixed Distributions , 1991, Theor. Comput. Sci..
[50] David Haussler,et al. Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..