Does the Markov Decision Process Fit the Data: Testing for the Markov Property in Sequential Decision Making
暂无分享,去创建一个
Chengchun Shi | Rui Song | Wenbin Lu | Ling Leng | Runzhe Wan | Wenbin Lu | C. Shi | R. Song | Ling Leng | Runzhe Wan
[1] Heping Zhang,et al. Conditional Distance Correlation , 2015, Journal of the American Statistical Association.
[2] Bernhard Schölkopf,et al. Kernel-based Conditional Independence Test and Application in Causal Discovery , 2011, UAI.
[3] Martin A. Riedmiller. Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.
[4] H. White,et al. A FLEXIBLE NONPARAMETRIC TEST FOR CONDITIONAL INDEPENDENCE , 2013, Econometric Theory.
[5] Bin Chen,et al. TESTING FOR THE MARKOV PROPERTY IN TIME SERIES , 2011, Econometric Theory.
[6] Leo Breiman,et al. Random Forests , 2001, Machine Learning.
[7] Alexandre Belloni,et al. A high dimensional Central Limit Theorem for martingales, with applications to context tree models , 2018, 1809.02741.
[8] Peter Stone,et al. Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.
[9] Gérard Biau,et al. Analysis of a Random Forests Model , 2010, J. Mach. Learn. Res..
[10] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[11] David Rodbard,et al. Interpretation of continuous glucose monitoring data: glycemic variability and quality of glycemic control. , 2009, Diabetes technology & therapeutics.
[12] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[13] Leslie Pack Kaelbling,et al. Acting Optimally in Partially Observable Stochastic Domains , 1994, AAAI.
[14] R. Tweedie,et al. Rates of convergence of the Hastings and Metropolis algorithms , 1996 .
[15] Yisong Yue,et al. Batch Policy Learning under Constraints , 2019, ICML.
[16] Ramachandran S Vasan,et al. Cohort Profile: The Framingham Heart Study (FHS): overview of milestones in cardiovascular epidemiology. , 2015, International journal of epidemiology.
[17] L. Tierney. Markov Chains for Exploring Posterior Distributions , 1994 .
[18] R. C. Bradley. Basic properties of strong mixing conditions. A survey and some open questions , 2005, math/0511078.
[19] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[20] Nicolai Meinshausen,et al. Quantile Regression Forests , 2006, J. Mach. Learn. Res..
[21] Thomas B. Berrett,et al. The conditional permutation test for independence while controlling for confounders , 2018, Journal of the Royal Statistical Society: Series B (Statistical Methodology).
[22] Yongmiao Hong,et al. CHARACTERISTIC FUNCTION BASED TESTING FOR CONDITIONAL INDEPENDENCE: A NONPARAMETRIC REGRESSION APPROACH , 2017, Econometric Theory.
[23] C. J. Stone,et al. Consistent Nonparametric Regression , 1977 .
[24] Cynthia R. Marling,et al. The OhioT1DM Dataset for Blood Glucose Level Prediction: Update 2020 , 2020, KDH@ECAI.
[25] Yingbin Liang,et al. Finite-Sample Analysis for SARSA with Linear Function Approximation , 2019, NeurIPS.
[26] Peter Bühlmann,et al. Estimating High-Dimensional Directed Acyclic Graphs with the PC-Algorithm , 2007, J. Mach. Learn. Res..
[27] Xiaohong Chen,et al. Optimal Uniform Convergence Rates and Asymptotic Normality for Series Estimators Under Weak Dependence and Weak Conditions , 2014, 1412.6020.
[28] Johannes Schmidt-Hieber,et al. Nonparametric regression using deep neural networks with ReLU activation function , 2017, The Annals of Statistics.
[29] Bernard Bercu,et al. Exponential inequalities for self-normalized martingales with applications , 2007, 0707.3715.
[30] J. Robins,et al. Double/Debiased Machine Learning for Treatment and Structural Parameters , 2017 .
[31] H. White,et al. Testing Conditional Independence Via Empirical Likelihood , 2014 .
[32] C. F. Wu. JACKKNIFE , BOOTSTRAP AND OTHER RESAMPLING METHODS IN REGRESSION ANALYSIS ' BY , 2008 .
[33] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[34] Jalaj Bhandari,et al. A Finite Time Analysis of Temporal Difference Learning With Linear Function Approximation , 2018, COLT.
[35] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[36] Kengo Kato,et al. Detailed proof of Nazarov's inequality , 2017, 1711.10696.