暂无分享,去创建一个
Pierre Baldi | Roy Fox | Forest Agostinelli | Stephen McAleer | Alexander Shmakov | P. Baldi | Roy Fox | Forest Agostinelli | Alexander Shmakov | S. McAleer
[1] Nils J. Nilsson,et al. A Formal Basis for the Heuristic Determination of Minimum Cost Paths , 1968, IEEE Trans. Syst. Sci. Cybern..
[2] Sandra Zilles,et al. Learning heuristic functions for large state spaces , 2011, Artif. Intell..
[3] Pierre Baldi,et al. Solving the Rubik's Cube Without Human Knowledge , 2018, ArXiv.
[4] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[5] Marco Gori,et al. Likely-Admissible and Sub-Symbolic Heuristics , 2004, ECAI.
[6] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[7] Demis Hassabis,et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.
[8] Yunguan Fu,et al. Ranked Reward: Enabling Self-Play Reinforcement Learning for Combinatorial Optimization , 2018, ArXiv.
[9] Dong-Ling Deng,et al. Topological Quantum Compiling with Reinforcement Learning , 2020, Physical review letters.
[10] Carla P. Gomes,et al. A Novel Automated Curriculum Strategy to Solve Hard Sokoban Planning Instances , 2021, NeurIPS.
[11] Pierre Baldi,et al. Solving the Rubik’s cube with deep reinforcement learning and search , 2019, Nature Machine Intelligence.
[12] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[13] JUNGHA JIN,et al. 3D CUBE Algorithm for the Key Generation Method: Applying Deep Neural Network Learning-Based , 2020, IEEE Access.
[14] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.
[15] Kurt Hornik,et al. Multilayer feedforward networks are universal approximators , 1989, Neural Networks.
[16] M. Puterman,et al. Modified Policy Iteration Algorithms for Discounted Markov Decision Problems , 1978 .
[17] David K. Smith,et al. Dynamic Programming and Optimal Control. Volume 1 , 1996 .
[18] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[19] Teruhisa Miura,et al. A* with Partial Expansion for Large Branching Factor Problems , 2000, AAAI/IAAI.
[20] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[21] Jure Leskovec,et al. Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation , 2018, NeurIPS.
[22] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[23] Nathan R. Sturtevant,et al. Partial-Expansion A* with Selective Node Generation , 2012, SOCS.
[24] Jyh-Da Wei,et al. Using Neural Networks for Evaluation in Heuristic Search Algorithm , 2011, AAAI.
[25] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[26] Larry S. Davis,et al. Pattern Databases , 1979, Data Base Design Techniques II.
[27] Le Song,et al. Retro*: Learning Retrosynthetic Planning with Neural Guided A* Search , 2020, ICML.
[28] Ira Pohl,et al. Heuristic Search Viewed as Path Finding in a Graph , 1970, Artif. Intell..
[29] Rémi Coulom,et al. Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.
[30] Jürgen Schmidhuber,et al. Deep learning in neural networks: An overview , 2014, Neural Networks.
[31] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.