Reinforcement learning in the presence of rare events
暂无分享,去创建一个
[1] C. Cornell,et al. Adaptive Importance Sampling , 1990 .
[2] James A. Bucklew,et al. Introduction to Rare Event Simulation , 2010 .
[3] Shie Mannor,et al. Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems , 2006, J. Mach. Learn. Res..
[4] Sanjoy Dasgupta,et al. Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.
[5] Peter W. Glynn,et al. Stochastic Simulation: Algorithms and Analysis , 2007 .
[6] Doina Precup,et al. Between MOPs and Semi-MOP: Learning, Planning & Representing Knowledge at Multiple Temporal Scales , 1998 .
[7] Richard S. Sutton,et al. Reinforcement Learning , 1992, Handbook of Machine Learning.
[8] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 1998, Machine Learning.
[9] J. Beck,et al. A new adaptive importance sampling scheme for reliability calculations , 1999 .
[10] Paul Bratley,et al. A guide to simulation , 1983 .
[11] N. Metropolis,et al. The Monte Carlo method. , 1949 .
[12] Sumit Roy,et al. Adaptive Importance Sampling , 1993, IEEE J. Sel. Areas Commun..
[13] Paul Bratley,et al. A guide to simulation (2nd ed.) , 1986 .
[14] Michael Devetsikiotis,et al. An algorithmic approach to the optimization of importance sampling parameters in digital communication system simulation , 1993, IEEE Trans. Commun..
[15] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[16] Dennis D. Cox,et al. Adaptive importance sampling on discrete Markov chains , 1999 .
[17] Michael G. Madden,et al. Transfer of Experience Between Reinforcement Learning Environments with Progressive Difficulty , 2004, Artificial Intelligence Review.
[18] R. Bellman. Dynamic programming. , 1957, Science.
[19] V.F. Nicola,et al. Adaptive importance sampling simulation of queueing networks , 2000, 2000 Winter Simulation Conference Proceedings (Cat. No.00CH37165).
[20] Peter W. Glynn,et al. A Markov chain perspective on adaptive Monte Carlo algorithms , 2001, Proceeding of the 2001 Winter Simulation Conference (Cat. No.01CH37304).
[21] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[22] Peter Stone,et al. Behavior transfer for value-function-based reinforcement learning , 2005, AAMAS '05.
[23] M. F.,et al. Bibliography , 1985, Experimental Gerontology.
[24] Richard S. Sutton,et al. Reinforcement Learning of Local Shape in the Game of Go , 2007, IJCAI.
[25] C. Bucher. Adaptive sampling — an iterative fast Monte Carlo procedure , 1988 .
[26] Manuela M. Veloso,et al. Team-Partitioned, Opaque-Transition Reinforced Learning , 1998, RoboCup.
[28] Pieter-Tjerk de Boer,et al. Estimating buffer overflows in three stages using cross-entropy , 2002, Proceedings of the Winter Simulation Conference.
[29] John N. Tsitsiklis,et al. Bias and Variance Approximation in Value Function Estimates , 2007, Manag. Sci..
[30] M. Jackson. A Survey of Models of Network Formation: Stability and Efficiency , 2003 .
[31] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[32] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[33] Avishai Mandelbaum,et al. Queueing Models of Call Centers: An Introduction , 2002, Ann. Oper. Res..
[34] H. Robbins. A Stochastic Approximation Method , 1951 .
[35] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[36] Andrew G. Barto,et al. Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.
[37] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[38] Donald L. Iglehart,et al. Importance sampling for stochastic simulations , 1989 .
[39] P. Balaban,et al. A Modified Monte-Carlo Simulation Technique for the Evaluation of Error Rate in Digital Communication Systems , 1980, IEEE Trans. Commun..
[40] Perwez Shahabuddin,et al. Importance sampling for the simulation of highly reliable Markovian systems , 1994 .
[41] M. Schwind,et al. A Reinforcement Learning Approach for Supply Chain Management , 2022 .
[42] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[43] Larry Wasserman,et al. All of Statistics: A Concise Course in Statistical Inference , 2004 .
[44] Paul Glasserman,et al. Rare-Event Simulation for Multistage Production-Inventory Systems , 1996 .
[45] Vivek S. Borkar,et al. Adaptive Importance Sampling Technique for Markov Chains Using Stochastic Approximation , 2006, Oper. Res..
[46] Vivek S. Borkar,et al. A Simulation-Based Algorithm for Ergodic Control of Markov Chains Conditioned on Rare Events , 2006, J. Mach. Learn. Res..
[47] H. Kushner,et al. Stochastic Approximation and Recursive Algorithms and Applications , 2003 .
[48] Dirk P. Kroese,et al. The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation and Machine Learning , 2004 .
[49] Lih-Yuan Deng,et al. The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation, and Machine Learning , 2006, Technometrics.
[50] Stuart J. Russell,et al. Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.
[51] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[52] T. E. Booth. Exponential convergence for Monte Carlo particle transport , 1985 .
[53] Moshe Zukerman,et al. A quantitative measure for telecommunications networks topology design , 2005, IEEE/ACM Transactions on Networking.
[54] Richard S. Sutton,et al. Reinforcement learning with replacing eligibility traces , 2004, Machine Learning.
[55] Philip Heidelberger,et al. Fast simulation of rare events in queueing and reliability models , 1993, TOMC.
[56] John N. Tsitsiklis,et al. Bias and variance in value function estimation , 2004, ICML.
[57] Peter Stone,et al. Reinforcement Learning for RoboCup Soccer Keepaway , 2005, Adapt. Behav..
[58] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..
[59] Eitan Altman,et al. A survey on networking games in telecommunications , 2006, Comput. Oper. Res..
[60] Wei Zhang,et al. A Reinforcement Learning Approach to job-shop Scheduling , 1995, IJCAI.
[61] Craig Kollman. Rare event simulation in radiation transport , 1993 .
[62] Gerald Tesauro,et al. Temporal difference learning and TD-Gammon , 1995, CACM.
[63] Satinder Singh. Transfer of Learning by Composing Solutions of Elemental Sequential Tasks , 1992, Mach. Learn..
[64] Andrew G. Barto,et al. Reinforcement learning , 1998 .
[65] Dirk P. Kroese,et al. The Cross Entropy Method: A Unified Approach To Combinatorial Optimization, Monte-carlo Simulation (Information Science and Statistics) , 2004 .
[66] Thomas G. Dietterich. The MAXQ Method for Hierarchical Reinforcement Learning , 1998, ICML.
[67] Peter L. Bartlett,et al. Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning , 2001, J. Mach. Learn. Res..
[68] Manuela M. Veloso,et al. Team-partitioned, opaque-transition reinforcement learning , 1999, AGENTS '99.