暂无分享,去创建一个
Sepp Hochreiter | Michael Widrich | Thomas Unterthiner | Jose A. Arjona-Medina | Michael Gillhofer | S. Hochreiter | Michael Gillhofer | Michael Widrich | Thomas Unterthiner
[1] V. Marčenko,et al. DISTRIBUTION OF EIGENVALUES FOR SOME SETS OF RANDOM MATRICES , 1967 .
[2] A. H. Klopf,et al. Brain Function and Adaptive Systems: A Heterostatic Theory , 1972 .
[3] A G Barto,et al. Toward a modern theory of adaptive networks: expectation and prediction. , 1981, Psychological review.
[4] M. J. Sobel. The variance of discounted Markov decision processes , 1982 .
[5] Frank Fallside,et al. Dynamic reinforcement driven error propagation networks with application to game playing , 1989 .
[6] C. Watkins. Learning from delayed rewards , 1989 .
[7] Jürgen Schmidhuber,et al. Reinforcement Learning in Markovian and Non-Markovian Environments , 1990, NIPS.
[8] Kuala Lumpur,et al. WITH TIME DELAYS , 1990 .
[9] P. Tseng. Solving H-horizon, stationary Markov decision problems in time proportional to log(H) , 1990 .
[10] Sepp Hochreiter,et al. Untersuchungen zu dynamischen neuronalen Netzen , 1991 .
[11] John N. Tsitsiklis,et al. An Analysis of Stochastic Shortest Path Problems , 1991, Math. Oper. Res..
[12] Long-Ji Lin,et al. Reinforcement learning for robots using neural networks , 1992 .
[13] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[14] George H. John. When the Best Move Isn't Optimal: Q-learning with Exploration , 1994, AAAI.
[15] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[16] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[17] Richard S. Sutton,et al. A Menu of Designs for Reinforcement Learning Over Time , 1995 .
[18] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[19] J. Jachymski. Continuous dependence of attractors of iterated function systems , 1996 .
[20] Jürgen Schmidhuber,et al. LSTM can Solve Hard Long Time Lag Problems , 1996, NIPS.
[21] Dimitri P. Bertsekas,et al. Stochastic shortest path games: theory and algorithms , 1997 .
[22] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[23] V. Borkar. Stochastic approximation with two time scales , 1997 .
[24] E. Kirr,et al. Continuous dependence on parameters of the fixed points set for some set-valued operators , 1997 .
[25] Stephen D. Patek,et al. Stochastic and shortest path games: theory and algorithms , 1997 .
[26] G. Lugosi,et al. On Concentration-of-Measure Inequalities , 1998 .
[27] S. Hochreiter. Recurrent Neural Net Learning and Vanishing , 1998 .
[28] Sepp Hochreiter,et al. The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions , 1998, Int. J. Uncertain. Fuzziness Knowl. Based Syst..
[29] A. C. Rencher. Linear models in statistics , 1999 .
[30] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.
[31] Stefan Schaal,et al. Is imitation learning the route to humanoid robots? , 1999, Trends in Cognitive Sciences.
[32] Jürgen Schmidhuber,et al. Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.
[33] Jürgen Schmidhuber,et al. Recurrent nets that time and count , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.
[34] Thomas de Quincey. [C] , 2000, The Works of Thomas De Quincey, Vol. 1: Writings, 1799–1820.
[35] Balaraman Ravindran,et al. Symmetries and Model Minimization in Markov Decision Processes , 2001 .
[36] Sepp Hochreiter,et al. Learning to Learn Using Gradient Descent , 2001, ICANN.
[37] Bram Bakker,et al. Reinforcement Learning with Long Short-Term Memory , 2001, NIPS.
[38] E. Oja,et al. Independent Component Analysis , 2013 .
[39] A. Soshnikov. A Note on Universality of the Distribution of the Largest Eigenvalues in Certain Sample Covariance Matrices , 2001, math/0104113.
[40] Yoshua Bengio,et al. Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .
[41] Balaraman Ravindran,et al. SMDP Homomorphisms: An Algebraic Approach to Abstraction in Semi-Markov Decision Processes , 2003, IJCAI.
[42] Robert Givan,et al. Equivalence notions and model minimization in Markov decision processes , 2003, Artif. Intell..
[43] M. Akritas,et al. with censored data , 2003 .
[44] Garrison W. Cottrell,et al. Principled Methods for Advising Reinforcement Learning Agents , 2003, ICML.
[45] Eric Wiewiora,et al. Potential-Based Shaping and Q-Value Initialization are Equivalent , 2003, J. Artif. Intell. Res..
[46] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[47] Peter Dayan,et al. The convergence of TD(λ) for general λ , 1992, Machine Learning.
[48] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[49] Richard S. Sutton,et al. Reinforcement learning with replacing eligibility traces , 2004, Machine Learning.
[50] John N. Tsitsiklis,et al. Asynchronous Stochastic Approximation and Q-Learning , 1994, Machine Learning.
[51] Tommi S. Jaakkola,et al. Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.
[52] William Bolton,et al. Instrumentation And Control Systems , 2004 .
[53] Warren B. Powell,et al. Reinforcement Learning and Its Relationship to Supervised Learning , 2004 .
[54] Richard S. Sutton,et al. Landmark learning: An illustration of associative search , 1981, Biological Cybernetics.
[55] Andrew W. Moore,et al. Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.
[56] Jing Peng,et al. Incremental multi-step Q-learning , 1994, Machine Learning.
[57] Jürgen Schmidhuber,et al. Framewise phoneme classification with bidirectional LSTM and other neural network architectures , 2005, Neural Networks.
[58] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[59] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[60] Duane Szafron,et al. Visual Explanation of Evidence with Additive Classifiers , 2006, AAAI.
[61] Warren B. Powell,et al. Handbook of Learning and Approximate Dynamic Programming , 2006, IEEE Transactions on Automatic Control.
[62] Thomas J. Walsh,et al. Towards a Unified Theory of State Abstraction for MDPs , 2006, AI&M.
[63] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.
[64] Klaus Obermayer,et al. Fast model-based protein homology detection without alignment , 2007, Bioinform..
[65] Stefan Schaal,et al. Reinforcement learning by reward-weighted regression for operational space control , 2007, ICML '07.
[66] John N. Tsitsiklis,et al. Bias and Variance Approximation in Value Function Estimates , 2007, Manag. Sci..
[67] M. Frigon. Fixed point and continuation results for contractions in metric and gauge spaces , 2007 .
[68] B. Bakker,et al. Reinforcement learning by backpropagation through an LSTM model/critic , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.
[69] Peter Dayan,et al. Hippocampal Contributions to Control: The Third Way , 2007, NIPS.
[70] Hazhir Rahmandad,et al. Effects of feedback delay on learning , 2009 .
[71] J. Schmidhuber,et al. A Novel Connectionist System for Unconstrained Handwriting Recognition , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[72] Ingemar J. Cox,et al. Probably Approximately Correct Search , 2009, ICTIR.
[73] M. Rudelson,et al. Non-asymptotic theory of random matrices: extreme singular values , 2010, 1003.2990.
[74] Hado van Hasselt,et al. Double Q-learning , 2010, NIPS.
[75] U. Rieder,et al. Markov Decision Processes , 2010 .
[76] Klaus Obermayer,et al. The optimal unbiased value estimator and its relation to LSTD, TD and MC , 2010, Machine Learning.
[77] Ferenc Beleznay,et al. Comparing Value-Function Estimation Algorithms in Undiscounted Problems , 2012 .
[78] Shie Mannor,et al. Policy Gradients with Variance Related Risk Criteria , 2012, ICML.
[79] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[80] Melanie Mitchell,et al. Interpreting individual classifications of hierarchical networks , 2013, 2013 IEEE Symposium on Computational Intelligence and Data Mining (CIDM).
[81] Geoffrey E. Hinton,et al. Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[82] Jürgen Schmidhuber,et al. Evolving large-scale neural networks for vision-based reinforcement learning , 2013, GECCO '13.
[83] Andrew W. Senior,et al. Long short-term memory recurrent neural network architectures for large scale acoustic modeling , 2014, INTERSPEECH.
[84] Joaquín González-Rodríguez,et al. Automatic language identification using long short-term memory recurrent neural networks , 2014, INTERSPEECH.
[85] Björn W. Schuller,et al. Robust speech recognition using long short-term memory recurrent neural networks for hybrid acoustic modelling , 2014, INTERSPEECH.
[86] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[87] Wojciech Zaremba,et al. Recurrent Neural Network Regularization , 2014, ArXiv.
[88] Erik Marchi,et al. Multi-resolution linear prediction based features for audio onset detection with bidirectional LSTM neural networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[89] Alexander Binder,et al. On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation , 2015, PloS one.
[90] Jürgen Schmidhuber,et al. Deep learning in neural networks: An overview , 2014, Neural Networks.
[91] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[92] Nitish Srivastava,et al. Unsupervised Learning of Video Representations using LSTMs , 2015, ICML.
[93] Marcus Rohrbach,et al. Translating Videos to Natural Language Using Deep Recurrent Neural Networks , 2014, NAACL.
[94] David Vandyke,et al. Reward Shaping with Recurrent Neural Networks for Speeding up On-Line Policy Learning in Spoken Dialogue Systems , 2015, SIGDIAL Conference.
[95] Sam Devlin,et al. Expressing Arbitrary Reward Functions as Potential-Based Advice , 2015, AAAI.
[96] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[97] Trevor Darrell,et al. Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[98] Peter Stone,et al. Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.
[99] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.
[100] Shie Mannor,et al. Learning the Variance of the Reward-To-Go , 2016, J. Mach. Learn. Res..
[101] Zhe L. Lin,et al. Top-Down Neural Attention by Excitation Backprop , 2016, ECCV.
[102] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[103] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.
[104] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[105] Zachary Feinstein. Continuity properties and sensitivity analysis of parameterized fixed points and approximate fixed points , 2016 .
[106] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[107] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.
[108] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.
[109] Marc G. Bellemare,et al. Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.
[110] Yuval Tassa,et al. Learning and Transfer of Modulated Locomotor Controllers , 2016, ArXiv.
[111] Peter Stone,et al. On the Analysis of Complex Backup Strategies in Monte Carlo Tree Search , 2016, ICML.
[112] A. Veretennikov. Ergodic Markov processes and Poisson equations (lecture notes) , 2016, 1610.09661.
[113] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.
[114] Demis Hassabis,et al. Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm , 2017, ArXiv.
[115] Jukka Luoma,et al. Time delays, competitive interdependence, and firm performance , 2017 .
[116] Zhe L. Lin,et al. Top-Down Neural Attention by Excitation Backprop , 2016, International Journal of Computer Vision.
[117] Jürgen Schmidhuber,et al. LSTM: A Search Space Odyssey , 2015, IEEE Transactions on Neural Networks and Learning Systems.
[118] David Berthelot,et al. BEGAN: Boundary Equilibrium Generative Adversarial Networks , 2017, ArXiv.
[119] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.
[120] Tom Schaul,et al. Successor Features for Transfer in Reinforcement Learning , 2016, NIPS.
[121] Alexander Binder,et al. Explaining nonlinear classification decisions with deep Taylor decomposition , 2015, Pattern Recognit..
[122] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[123] Klaus-Robert Müller,et al. Explaining Recurrent Neural Network Predictions in Sentiment Analysis , 2017, WASSA@EMNLP.
[124] Ankur Taly,et al. Axiomatic Attribution for Deep Networks , 2017, ICML.
[125] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.
[126] David Budden,et al. Distributed Prioritized Experience Replay , 2018, ICLR.
[127] David Silver,et al. Meta-Gradient Reinforcement Learning , 2018, NeurIPS.
[128] Matthew E. Taylor,et al. A survey and critique of multiagent deep reinforcement learning , 2018, Autonomous Agents and Multi-Agent Systems.
[129] Shane Legg,et al. Noisy Networks for Exploration , 2017, ICLR.
[130] Sergey Levine,et al. Learning Robust Rewards with Adversarial Inverse Reinforcement Learning , 2017, ICLR 2017.
[131] Nando de Freitas,et al. Playing hard exploration games by watching YouTube , 2018, NeurIPS.
[132] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.
[133] Wojciech Samek,et al. Methods for interpreting and understanding deep neural networks , 2017, Digit. Signal Process..
[134] Rémi Munos,et al. Observe and Look Further: Achieving Consistent Performance on Atari , 2018, ArXiv.
[135] Ian Osband,et al. The Uncertainty Bellman Equation and Exploration , 2017, ICML.
[136] Peter Henderson,et al. Reward Estimation for Variance Reduction in Deep Reinforcement Learning , 2018, CoRL.
[137] Martin A. Riedmiller,et al. Learning by Playing - Solving Sparse Reward Tasks from Scratch , 2018, ICML.
[138] Ashley D. Edwards,et al. Forward-Backward Reinforcement Learning , 2018, ArXiv.
[139] Shalabh Bhatnagar,et al. Two Timescale Stochastic Approximation with Controlled Markov noise , 2015, Math. Oper. Res..
[140] Tom Schaul,et al. Transfer in Deep Reinforcement Learning Using Successor Features and Generalised Policy Improvement , 2018, ICML.
[141] Christopher Joseph Pal,et al. Sparse Attentive Backtracking: Temporal CreditAssignment Through Reminding , 2018, NeurIPS.
[142] Jürgen Schmidhuber,et al. World Models , 2018, ArXiv.
[143] Yan Wu,et al. Optimizing agent behavior over long time scales by transporting value , 2018, Nature Communications.
[144] Sergey Levine,et al. Recall Traces: Backtracking Models for Efficient Reinforcement Learning , 2018, ICLR.
[145] Juergen Schmidhuber,et al. Reinforcement Learning Upside Down: Don't Predict Rewards - Just Map Them to Actions , 2019, ArXiv.
[146] Sae-Young Chung,et al. Sample-Efficient Deep Reinforcement Learning via Episodic Backward Update , 2018, NeurIPS.
[147] Filipe Wall Mutz,et al. Training Agents using Upside-Down Reinforcement Learning , 2019, ArXiv.
[148] W. Hager,et al. and s , 2019, Shallow Water Hydraulics.
[149] Doina Precup,et al. Hindsight Credit Assignment , 2019, NeurIPS.
[150] Wojciech Samek,et al. Explaining and Interpreting LSTMs , 2019, Explainable AI.
[151] Richard Socher,et al. Keeping Your Distance: Solving Sparse Reward Tasks Using Self-Balancing Shaped Rewards , 2019, NeurIPS.
[152] Csaba Szepesvari,et al. Bandit Algorithms , 2020 .
[153] P. Alam. ‘G’ , 2021, Composites Engineering: An A–Z Guide.
[154] P. Alam,et al. H , 1887, High Explosives, Propellants, Pyrotechnics.
[155] P. Alam. ‘N’ , 2021, Composites Engineering: An A–Z Guide.