Distributed Artificial Intelligence: Second International Conference, DAI 2020, Nanjing, China, October 24–27, 2020, Proceedings
暂无分享,去创建一个
Yang Yu | Edith Elkind | Matthew E. Taylor | Yang Gao | E. Elkind | Yang Yu | Yang Gao
[1] Jakub W. Pachocki,et al. Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..
[2] Marcin Andrychowicz,et al. Parameter Space Noise for Exploration , 2017, ICLR.
[3] Marco Wiering,et al. Multi-Agent Reinforcement Learning for Traffic Light control , 2000 .
[4] Randal W. Beard,et al. A decentralized scheme for spacecraft formation flying via the virtual structure approach , 2003, Proceedings of the 2003 American Control Conference, 2003..
[5] Andreas Krause,et al. Safe Model-based Reinforcement Learning with Stability Guarantees , 2017, NIPS.
[6] Kimmo Berg,et al. Exclusion Method for Finding Nash Equilibrium in Multiplayer Games , 2017, AAAI.
[7] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.
[8] Mathijs de Weerdt,et al. Planning under Uncertainty for Coordinating Infrastructural Maintenance , 2013, ICAPS.
[9] Jonathan P. How,et al. Socially aware motion planning with deep reinforcement learning , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
[10] Pieter Abbeel,et al. Meta Learning Shared Hierarchies , 2017, ICLR.
[11] Kenneth O. Stanley,et al. Go-Explore: a New Approach for Hard-Exploration Problems , 2019, ArXiv.
[12] René M. B. M. de Koster,et al. A review of design and control of automated guided vehicle systems , 2006, Eur. J. Oper. Res..
[13] Mathijs de Weerdt,et al. Solving Transition-Independent Multi-Agent MDPs with Sparse Interactions , 2015, AAAI.
[14] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.
[15] Francesco Borrelli,et al. Decentralized receding horizon control for large scale dynamically decoupled systems , 2009, Autom..
[16] Kate Saenko,et al. Learning Multi-Level Hierarchies with Hindsight , 2017, ICLR.
[17] Tom Schaul,et al. Universal Value Function Approximators , 2015, ICML.
[18] Hui Cheng,et al. Decomposed Deep Reinforcement Learning for Robotic Control , 2020, AAMAS.
[19] Marina L. Gavrilova,et al. Roadmap-Based Path Planning - Using the Voronoi Diagram for a Clearance-Based Shortest Path , 2008, IEEE Robotics & Automation Magazine.
[20] Ming Zhou,et al. Mean Field Multi-Agent Reinforcement Learning , 2018, ICML.
[21] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..
[22] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.
[23] Bart De Schutter,et al. Decentralized Reinforcement Learning Control of a Robotic Manipulator , 2006, 2006 9th International Conference on Control, Automation, Robotics and Vision.
[24] Qiang Liu,et al. Learning to Explore via Meta-Policy Gradient , 2018, ICML.
[25] Jonathan P. How,et al. Collision Avoidance in Pedestrian-Rich Environments With Deep Reinforcement Learning , 2019, IEEE Access.
[26] Sergey Levine,et al. Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models , 2015, ArXiv.
[27] Doina Precup,et al. Off-Policy Deep Reinforcement Learning without Exploration , 2018, ICML.
[28] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[29] R. Olfati-Saber,et al. Consensus Filters for Sensor Networks and Distributed Sensor Fusion , 2005, Proceedings of the 44th IEEE Conference on Decision and Control.
[30] Zongqing Lu,et al. Learning Attentional Communication for Multi-Agent Cooperation , 2018, NeurIPS.
[31] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[32] Sergey Levine,et al. Data-Efficient Hierarchical Reinforcement Learning , 2018, NeurIPS.
[33] Gábor Lugosi,et al. Minimax Policies for Combinatorial Prediction Games , 2011, COLT.
[34] Yoshua Bengio,et al. Learning a synaptic learning rule , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.
[35] Xianhao Xu,et al. Evaluating battery charging and swapping strategies in a robotic mobile fulfillment system , 2017, Eur. J. Oper. Res..
[36] Wei Chen,et al. Combinatorial Multi-Armed Bandit: General Framework and Applications , 2013, ICML.
[37] Tamás Vicsek,et al. Optimized flocking of autonomous drones in confined environments , 2018, Science Robotics.
[38] Alexandru Iosup,et al. The Grid Workloads Archive , 2008, Future Gener. Comput. Syst..
[39] Roger McHaney,et al. Modelling battery constraints in discrete event automated guided vehicle simulations , 1995 .
[40] Rémi Coulom,et al. Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.
[41] Paolo Fiorini,et al. Motion Planning in Dynamic Environments Using Velocity Obstacles , 1998, Int. J. Robotics Res..
[42] Dan Ventura,et al. Predicting and Preventing Coordination Problems in Cooperative Q-learning Systems , 2007, IJCAI.
[43] Nikos A. Vlassis,et al. Collaborative Multiagent Reinforcement Learning by Payoff Propagation , 2006, J. Mach. Learn. Res..
[44] Nando de Freitas,et al. Sample Efficient Actor-Critic with Experience Replay , 2016, ICLR.
[45] Paul W. Goldberg,et al. The complexity of computing a Nash equilibrium , 2006, STOC '06.
[46] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[47] Sergey Levine,et al. Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables , 2019, ICML.
[48] Pieter Abbeel,et al. Automatic Goal Generation for Reinforcement Learning Agents , 2017, ICML.
[49] L. Shapley,et al. Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.
[50] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[51] Ralf Stetter,et al. A multi-agent reinforcement learning approach for the efficient control of mobile robot , 2013, 2013 IEEE 7th International Conference on Intelligent Data Acquisition and Advanced Computing Systems (IDAACS).
[52] Oriol Vinyals,et al. Matching Networks for One Shot Learning , 2016, NIPS.
[53] Tor Lattimore,et al. Near-optimal PAC bounds for discounted MDPs , 2014, Theor. Comput. Sci..
[54] Yang Yu,et al. Reinforcement Learning with Derivative-Free Exploration , 2019, AAMAS.
[55] Gordon F. Royle,et al. Algebraic Graph Theory , 2001, Graduate texts in mathematics.
[56] Joel Z. Leibo,et al. Multi-agent Reinforcement Learning in Sequential Social Dilemmas , 2017, AAMAS.
[57] Nicolò Cesa-Bianchi,et al. Combinatorial Bandits , 2012, COLT.
[58] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[59] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[60] Peter L. Bartlett,et al. RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.
[61] Shimon Whiteson,et al. Computing Convex Coverage Sets for Faster Multi-objective Coordination , 2015, J. Artif. Intell. Res..
[62] Tuomas Sandholm,et al. Computing an approximate jam/fold equilibrium for 3-player no-limit Texas Hold'em tournaments , 2008, AAMAS.
[63] Reza Olfati-Saber,et al. Consensus and Cooperation in Networked Multi-Agent Systems , 2007, Proceedings of the IEEE.
[64] Pieter Abbeel,et al. A Simple Neural Attentive Meta-Learner , 2017, ICLR.
[65] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.
[66] Lubomír Bakule,et al. Decentralized control: An overview , 2008, Annu. Rev. Control..
[67] Wei Ren,et al. Distributed leaderless consensus algorithms for networked Euler–Lagrange systems , 2009, Int. J. Control.
[68] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[69] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[70] Carlos Guestrin,et al. Multiagent Planning with Factored MDPs , 2001, NIPS.
[71] Erfu Yang,et al. Multiagent Reinforcement Learning for Multi-Robot Systems: A Survey , 2004 .
[72] Marc G. Bellemare,et al. Count-Based Exploration with Neural Density Models , 2017, ICML.
[73] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.
[74] Jonathan P. How,et al. Decentralized non-communicating multiagent collision avoidance with deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).
[75] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[76] D. Fudenberg,et al. The Theory of Learning in Games , 1998 .
[77] Daan Wierstra,et al. Meta-Learning with Memory-Augmented Neural Networks , 2016, ICML.
[78] Afsaneh Haddadi,et al. Application of multi-agent systems in traffic and transportation , 1997, IEE Proc. Softw. Eng..
[79] Anton Kabysh,et al. INFLUENCE LEARNING FOR MULTI-AGENT SYSTEM BASED ON REINFORCEMENT LEARNING , 2014 .
[80] Benjamin Van Roy,et al. Deep Exploration via Bootstrapped DQN , 2016, NIPS.
[81] Pierre-Yves Oudeyer,et al. R-IAC: Robust Intrinsically Motivated Exploration and Active Learning , 2009, IEEE Transactions on Autonomous Mental Development.
[82] N. Vieille. Two-player stochastic games I: A reduction , 2000 .
[83] Frank Harary,et al. Graph Theory , 2016 .
[84] Tom Schaul,et al. Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.
[85] Fei Sha,et al. Actor-Attention-Critic for Multi-Agent Reinforcement Learning , 2018, ICML.
[86] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[87] Wei Chu,et al. A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.
[88] Dinesh Manocha,et al. Reciprocal Velocity Obstacles for real-time multi-agent navigation , 2008, 2008 IEEE International Conference on Robotics and Automation.
[89] Ann Nowé,et al. Exploring selfish reinforcement learning in repeated games with stochastic rewards , 2007, Autonomous Agents and Multi-Agent Systems.
[90] Pierre-Yves Oudeyer,et al. Intrinsically Motivated Goal Exploration Processes with Automatic Curriculum Learning , 2017, J. Mach. Learn. Res..
[91] Jie Xu,et al. Budget-Constrained Edge Service Provisioning With Demand Estimation via Bandit Learning , 2019, IEEE Journal on Selected Areas in Communications.
[92] Aleksandrs Slivkins,et al. Contextual Bandits with Similarity Information , 2009, COLT.
[93] Farzaneh Abdollahi,et al. A Decentralized Cooperative Control Scheme With Obstacle Avoidance for a Team of Mobile Robots , 2014, IEEE Transactions on Industrial Electronics.
[94] Shimon Whiteson,et al. Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.
[95] Alexei A. Efros,et al. Large-Scale Study of Curiosity-Driven Learning , 2018, ICLR.
[96] Yi Wu,et al. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.
[97] J. Andrew Bagnell,et al. Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy , 2010 .
[98] Sergey Levine,et al. QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.
[99] Sham M. Kakade,et al. Provably Efficient Maximum Entropy Exploration , 2018, ICML.
[100] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[101] Yuan Tian,et al. H∞ Model-free Reinforcement Learning with Robust Stability Guarantee , 2019, ArXiv.
[102] Wolfgang Hönig,et al. Robust Trajectory Execution for Multi-robot Teams Using Distributed Real-time Replanning , 2018, DARS.
[103] H. Kuk. On equilibrium points in bimatrix games , 1996 .
[104] Xiaoyan Zhu,et al. Contextual Combinatorial Bandit and its Application on Diversified Online Recommendation , 2014, SDM.
[105] Rahul Savani,et al. Negative Update Intervals in Deep Multi-Agent Reinforcement Learning , 2018, AAMAS.
[106] Sebastian Scherer,et al. Improving Stochastic Policy Gradients in Continuous Control with Deep Reinforcement Learning using the Beta Distribution , 2017, ICML.
[107] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[108] Shlomo Zilberstein,et al. Incremental Policy Generation for Finite-Horizon DEC-POMDPs , 2009, ICAPS.
[109] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[110] Koray Kavukcuoglu,et al. PGQ: Combining policy gradient and Q-learning , 2016, ArXiv.
[111] Andrew Y. Ng,et al. Near-Bayesian exploration in polynomial time , 2009, ICML '09.
[112] Saptarshi Bandyopadhyay,et al. Fast, On-line Collision Avoidance for Dynamic Vehicles Using Buffered Voronoi Cells , 2017, IEEE Robotics and Automation Letters.
[113] Saptarshi Bandyopadhyay,et al. Probabilistic swarm guidance using optimal transport , 2014, 2014 IEEE Conference on Control Applications (CCA).
[114] David Q. Mayne,et al. Constrained model predictive control: Stability and optimality , 2000, Autom..
[115] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[116] Pierre-Yves Oudeyer,et al. Unsupervised Learning of Goal Spaces for Intrinsically Motivated Goal Exploration , 2018, ICLR.
[117] Tom Schaul,et al. FeUdal Networks for Hierarchical Reinforcement Learning , 2017, ICML.
[118] Xi Chen,et al. 3-NASH is PPAD-Complete , 2005, Electron. Colloquium Comput. Complex..
[119] Hugh H. T. Liu,et al. UDE-Based Robust Command Filtered Backstepping Control for Close Formation Flight , 2018, IEEE Transactions on Industrial Electronics.
[120] Vijay Kumar,et al. Sensing and coverage for a network of heterogeneous robots , 2008, 2008 47th IEEE Conference on Decision and Control.
[121] Ying Wang,et al. Multi-robot Box-pushing: Single-Agent Q-Learning vs. Team Q-Learning , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[122] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.
[123] Guillaume J. Laurent,et al. Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems , 2012, The Knowledge Engineering Review.
[124] Craig Boutilier,et al. The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.
[125] Sean Luke,et al. Lenient learners in cooperative multiagent systems , 2006, AAMAS '06.
[126] Maria L. Gini,et al. Adaptive Learning for Multi-Agent Navigation , 2015, AAMAS.
[127] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[128] Bo An,et al. Computing Solutions in Infinite-Horizon Discounted Adversarial Patrolling Games , 2014, ICAPS.
[129] Shimon Whiteson,et al. Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.
[130] Yoav Shoham,et al. Simple search methods for finding a Nash equilibrium , 2004, Games Econ. Behav..
[131] Maria L. Gini,et al. Implicit Coordination in Crowded Multi-Agent Navigation , 2016, AAAI.
[132] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.
[133] Jose B. Cruz,et al. Game Theoretic Approach to Threat Prediction and Situation Awareness , 2006, 2006 9th International Conference on Information Fusion.
[134] Mykel J. Kochenderfer,et al. Cooperative Multi-agent Control Using Deep Reinforcement Learning , 2017, AAMAS Workshops.
[135] Doina Precup,et al. The Option-Critic Architecture , 2016, AAAI.
[136] Wei Pan,et al. Model-Reference Reinforcement Learning for Collision-Free Tracking Control of Autonomous Surface Vehicles , 2020, IEEE Transactions on Intelligent Transportation Systems.
[137] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[138] C A Nelson,et al. Learning to Learn , 2017, Encyclopedia of Machine Learning and Data Mining.
[139] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..
[140] Pieter Abbeel,et al. Some Considerations on Learning to Explore via Meta-Reinforcement Learning , 2018, ICLR 2018.
[141] Henry Zhu,et al. Soft Actor-Critic Algorithms and Applications , 2018, ArXiv.
[142] Peter Vrancx,et al. Switching dynamics of multi-agent learning , 2008, AAMAS.
[143] Victor Talpaert,et al. Deep Reinforcement Learning for Autonomous Driving: A Survey , 2020, IEEE Transactions on Intelligent Transportation Systems.
[144] Geoffrey E. Hinton,et al. Feudal Reinforcement Learning , 1992, NIPS.
[145] Masashi Sugiyama,et al. Reducing Overestimation Bias in Multi-Agent Domains Using Double Centralized Critics , 2019, ArXiv.
[146] Dinesh Manocha,et al. Reciprocal n-Body Collision Avoidance , 2011, ISRR.
[147] Ann Nowé,et al. Learning to Coordinate with Coordination Graphs in Repeated Single-Stage Multi-Agent Decision Problems , 2018, ICML.
[148] Chao Qian,et al. Self-Guided Evolution Strategies with Historical Estimated Gradients , 2020, IJCAI.
[149] Ming Xin,et al. Integrated Optimal Formation Control of Multiple Unmanned Aerial Vehicles , 2012, IEEE Transactions on Control Systems Technology.
[150] Xiaotie Deng,et al. Settling the Complexity of Two-Player Nash Equilibrium , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).
[151] Yang Yu,et al. Towards Sample Efficient Reinforcement Learning , 2018, IJCAI.
[152] Javier Larrosa,et al. Bucket elimination for multiobjective optimization problems , 2006, J. Heuristics.
[153] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[154] John Enright,et al. Optimization and Coordinated Autonomy in Mobile Fulfillment Systems , 2011, Automated Action Planning for Autonomous Mobile Robots.
[155] Tuomas Sandholm,et al. Computing Equilibria in Multiplayer Stochastic Games of Imperfect Information , 2009, IJCAI.
[156] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[157] J. Nash. Equilibrium Points in N-Person Games. , 1950, Proceedings of the National Academy of Sciences of the United States of America.
[158] Ofir Nachum,et al. A Lyapunov-based Approach to Safe Reinforcement Learning , 2018, NeurIPS.
[159] Marcin Andrychowicz,et al. Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research , 2018, ArXiv.
[160] Sridhar Mahadevan,et al. Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..
[161] Yi Guo,et al. Nonlinear decentralized control of large-scale power systems , 2000, Autom..
[162] Guillaume J. Laurent,et al. Hysteretic q-learning :an algorithm for decentralized reinforcement learning in cooperative multi-agent teams , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[163] Sergey Levine,et al. Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review , 2018, ArXiv.
[164] Stuart J. Russell,et al. Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.
[165] Marcin Andrychowicz,et al. Hindsight Experience Replay , 2017, NIPS.
[166] Wei Pan,et al. Model-Reference Reinforcement Learning Control of Autonomous Surface Vehicles , 2020, 2020 59th IEEE Conference on Decision and Control (CDC).
[167] Dolores Blanco,et al. Voronoi diagram and fast marching applied to path planning , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..
[168] Michail G. Lagoudakis,et al. Coordinated Reinforcement Learning , 2002, ICML.
[169] Hugh H. T. Liu,et al. Aerodynamic model-based robust adaptive control for close formation flight , 2018, Aerospace Science and Technology.
[170] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[171] Yevgeniy Vorobeychik,et al. Computing Stackelberg Equilibria in Discounted Stochastic Games , 2012, AAAI.
[172] Krzysztof Choromanski,et al. From Complexity to Simplicity: Adaptive ES-Active Subspaces for Blackbox Optimization , 2019, NeurIPS.
[173] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[174] Nikos A. Vlassis,et al. Multi-robot decision making using coordination graphs , 2003 .
[175] Zeb Kurth-Nelson,et al. Learning to reinforcement learn , 2016, CogSci.
[176] Martin Lauer,et al. An Algorithm for Distributed Reinforcement Learning in Cooperative Multi-Agent Systems , 2000, ICML.
[177] Sean Luke,et al. Lenient Learning in Independent-Learner Stochastic Cooperative Games , 2016, J. Mach. Learn. Res..
[178] Peter Henderson,et al. An Introduction to Deep Reinforcement Learning , 2018, Found. Trends Mach. Learn..
[179] Amos J. Storkey,et al. Exploration by Random Network Distillation , 2018, ICLR.
[180] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..
[181] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[182] Bart De Schutter,et al. A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).
[183] Peter Vrancx,et al. Learning multi-agent state space representations , 2010, AAMAS.
[184] Eduardo F. Morales,et al. An Introduction to Reinforcement Learning , 2011 .
[185] Bhaskar Krishnamachari,et al. Combinatorial Network Optimization With Unknown Variables: Multi-Armed Bandits With Linear Rewards and Individual Observations , 2010, IEEE/ACM Transactions on Networking.
[186] Bo An,et al. Impression Allocation for Combating Fraud in E-commerce Via Deep Reinforcement Learning with Action Norm Penalty , 2018, IJCAI.
[187] Kagan Tumer,et al. A multiagent approach to managing air traffic flow , 2010, Autonomous Agents and Multi-Agent Systems.
[188] Michael L. Littman,et al. An analysis of model-based Interval Estimation for Markov Decision Processes , 2008, J. Comput. Syst. Sci..
[189] Mark Jacobus Richard Ebben,et al. Logistic control in automated transportation networks , 2001 .
[190] Robert Wilson,et al. A global Newton method to compute Nash equilibria , 2003, J. Econ. Theory.
[191] Shane Legg,et al. Noisy Networks for Exploration , 2017, ICLR.
[192] David A. Schoenwald,et al. Decentralized control of cooperative robotic vehicles: theory and application , 2002, IEEE Trans. Robotics Autom..
[193] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[194] Hugo Larochelle,et al. Optimization as a Model for Few-Shot Learning , 2016, ICLR.