暂无分享,去创建一个
[1] Phhilippe Jorion. Value at Risk: The New Benchmark for Managing Financial Risk , 2000 .
[2] Justo Puerto,et al. Dynamic programming analysis of the TV game "Who wants to be a millionaire?" , 2007, Eur. J. Oper. Res..
[3] Shie Mannor,et al. Percentile optimization in uncertain Markov decision processes with application to efficient exploration , 2007, ICML '07.
[4] Peter C. Fishburn,et al. An axiomatic characterization of skew-symmetric bilinear functionals, with applications to utility theory , 1981 .
[5] Pieter Abbeel,et al. Autonomous Helicopter Aerobatics through Apprenticeship Learning , 2010, Int. J. Robotics Res..
[6] Olivier Spanjaard,et al. Solving MDPs with Skew Symmetric Bilinear Utility Functions , 2015, IJCAI.
[7] E. McClennen. Rationality and Dynamic Choice: Foundational Explorations , 1996 .
[8] Paul Weng. Ordinal Decision Models for Markov Decision Processes , 2012, ECAI.
[9] Eyke Hüllermeier,et al. Preference-based reinforcement learning: evolutionary direct policy search using a preference-based racing algorithm , 2014, Machine Learning.
[10] Dirk Van den Poel,et al. Benefits of quantile regression for the analysis of customer lifetime value in a contractual setting: An application in financial services , 2009, Expert Syst. Appl..
[11] M. Rostek. Quantile Maximization in Decision Theory , 2009 .
[12] Véronique Bruyère,et al. Meet Your Expectations With Guarantees: Beyond Worst-Case Synthesis in Quantitative Games , 2013, STACS.
[13] V. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint , 2008 .
[14] Vivek S. Borkar,et al. A Learning Scheme for Blackwell ’ s Approachability in MDPs and Stackelberg Stochastic Games , 2014 .
[15] Gerald Tesauro,et al. Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..
[16] Vivek S. Borkar,et al. Risk-constrained Markov decision processes , 2010, 49th IEEE Conference on Decision and Control (CDC).
[17] Marc Schoenauer,et al. Preference-based Reinforcement Learning , 2011 .
[18] Paolo Viappiani,et al. Model-Free Reinforcement Learning with Skew-Symmetric Bilinear Utilities , 2016, UAI.
[19] Gildas Jeantet,et al. Resolute Choice in Sequential Decision Problems with Multiple Priors , 2011, IJCAI.
[20] Nicole Bäuerle,et al. Markov Decision Processes with Average-Value-at-Risk criteria , 2011, Math. Methods Oper. Res..
[21] Eyke Hüllermeier,et al. Qualitative Multi-Armed Bandits: A Quantile-Based Approach , 2015, ICML.
[22] Michèle Sebag,et al. APRIL: Active Preference-learning based Reinforcement Learning , 2012, ECML/PKDD.
[23] Sven Koenig,et al. Functional Value Iteration for Decision-Theoretic Planning with General Utility Functions , 2006, AAAI.
[24] Stella X. Yu,et al. Optimization Models for the First Arrival Target Distribution Function in Discrete Time , 1998 .
[25] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[26] Hugo Gimbert,et al. Pure Stationary Optimal Strategies in Markov Decision Processes , 2007, STACS.
[27] D. Blackwell. An analog of the minimax theorem for vector payoffs. , 1956 .
[28] Jerzy A. Filar,et al. Variance-Penalized Markov Decision Processes , 1989, Math. Oper. Res..
[29] Mohammad Ghavamzadeh,et al. Algorithms for CVaR Optimization in MDPs , 2014, NIPS.
[30] Richard Wolski,et al. QPRED: Using Quantile Predictions to Improve Power Usage for Private Clouds , 2017, 2017 IEEE 10th International Conference on Cloud Computing (CLOUD).
[31] D. J. White. Utility, probabilistic constraints, mean and variance of discounted rewards in Markov decision processes , 1987 .
[32] Eyke Hüllermeier,et al. Preference-based reinforcement learning: a formal framework and a policy iteration algorithm , 2012, Mach. Learn..
[33] Baining Guo,et al. Spoken dialogue management as planning and acting under uncertainty , 2001, INTERSPEECH.
[34] Jean-Yves Jaffray. Implementing Resolute Choice Under Uncertainty , 1998, UAI.
[35] Jerzy A. Filar,et al. Percentiles and markovian decision processes , 1983 .
[36] Andrew Y. Ng,et al. Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.
[37] Paul Weng,et al. Markov Decision Processes with Ordinal Rewards: Reference Point-Based Preferences , 2011, ICAPS.
[38] Bruno Zanuttini,et al. Interactive Value Iteration for Markov Decision Processes with Unknown Rewards , 2013, IJCAI.
[39] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[40] Sven Koenig,et al. Risk-Sensitive Planning with One-Switch Utility Functions: Value Iteration , 2005, AAAI.
[41] Craig Boutilier,et al. Regret-based Reward Elicitation for Markov Decision Processes , 2009, UAI.
[42] Michal Valko,et al. Extreme bandits , 2014, NIPS.
[43] Mickael Randour,et al. Percentile queries in multi-dimensional Markov decision processes , 2014, CAV.
[44] Jia Yuan Yu,et al. Sample Complexity of Risk-Averse Bandit-Arm Selection , 2013, IJCAI.