Deep Reinforcement Learning Techniques in Diversified Domains: A Survey

There have been tremendous improvements in deep learning and reinforcement learning techniques. Automating learning and intelligence to the full extent remains a challenge. The amalgamation of Reinforcement Learning and Deep Learning has brought breakthroughs in games and robotics in the past decade. Deep Reinforcement Learning (DRL) involves training the agent with raw input and learning via interaction with the environment. Motivated by recent successes of DRL, we have explored its adaptability to different domains and application areas. This paper also presents a comprehensive survey of the work done in recent years and simulation tools used for DRL. The current focus of researchers is on recording the experience in a better way, and refining the policy for futuristic moves. It is found that even after obtaining good results in Atari, Go, Robotics, multi-agent scenarios, there are challenges such as generalization, satisfying multiple objectives, divergence, learning robust policy. Furthermore, the complex environment and multiple agents are throwing new challenges, which is an open area of research.

[1]  Sebastian Thrun,et al.  Efficient Exploration In Reinforcement Learning , 1992 .

[2]  Martin A. Riedmiller,et al.  A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[3]  Alexander Zelinsky,et al.  Q-Learning in Continuous State and Action Spaces , 1999, Australian Joint Conference on Artificial Intelligence.

[4]  Klaus J. Kirchberg,et al.  Robust Face Detection Using the Hausdorff Distance , 2001, AVBPA.

[5]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[6]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[7]  C. Jack,et al.  Alzheimer's Disease Neuroimaging Initiative , 2008 .

[8]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[9]  Hsin-Min Wang,et al.  MATBN: A Mandarin Chinese Broadcast News Corpus , 2005, Int. J. Comput. Linguistics Chin. Lang. Process..

[10]  Andrea Bonarini,et al.  Reinforcement Learning in Continuous Action Spaces through Sequential Monte Carlo Methods , 2007, NIPS.

[11]  Bhaskar Krishnamachari,et al.  On myopic sensing for multi-channel opportunistic access: structure, optimality, and performance , 2007, IEEE Transactions on Wireless Communications.

[12]  Marwan Mattar,et al.  Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments , 2008 .

[13]  Nick C Fox,et al.  The Alzheimer's disease neuroimaging initiative (ADNI): MRI methods , 2008, Journal of magnetic resonance imaging : JMRI.

[14]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[15]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[16]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[17]  Mingyan Liu,et al.  Optimality of Myopic Sensing in Multi-Channel Opportunistic Access , 2008, 2008 IEEE International Conference on Communications.

[18]  Martin A. Riedmiller,et al.  Deep auto-encoder neural networks in reinforcement learning , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[19]  Nobuto Matsuhira,et al.  Virtual Robot Experimentation Platform V-REP: A Versatile 3D Robot Simulator , 2010, SIMPAR.

[20]  Iryna Gurevych,et al.  A Monolingual Tree-based Translation Model for Sentence Simplification , 2010, COLING.

[21]  Hado van Hasselt,et al.  Double Q-learning , 2010, NIPS.

[22]  Qing Zhao,et al.  Indexability of Restless Bandit Problems and Optimality of Whittle Index for Dynamic Multichannel Access , 2008, IEEE Transactions on Information Theory.

[23]  Seiji Yasunobu,et al.  Reinforcement learning with nonstationary reward depending on the episode , 2011, 2011 IEEE International Conference on Systems, Man, and Cybernetics.

[24]  Mingyan Liu,et al.  Online learning in opportunistic spectrum access: A restless bandit approach , 2010, 2011 Proceedings IEEE INFOCOM.

[25]  Qing Zhao,et al.  Logarithmic weak regret of non-Bayesian restless multi-armed bandit , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[26]  Martin A. Riedmiller,et al.  Reinforcement learning in feedback control , 2011, Machine Learning.

[27]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[28]  Tal Hassner,et al.  Face recognition in unconstrained videos with matched background similarity , 2011, CVPR 2011.

[29]  Peter Auer,et al.  Regret Bounds for Restless Markov Bandits , 2012, ALT.

[30]  Ryan G. Coleman,et al.  ZINC: A Free Tool to Discover Chemistry for Biology , 2012, J. Chem. Inf. Model..

[31]  Marc Toussaint,et al.  On Stochastic Optimal Control and Reinforcement Learning by Approximate Inference , 2012, Robotics: Science and Systems.

[32]  Martin A. Riedmiller,et al.  Autonomous reinforcement learning on raw visual input data in a real world application , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[33]  Yong Jae Lee,et al.  Discovering important people and objects for egocentric video summarization , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[35]  Wenhan Dai,et al.  Efficient online learning for opportunistic spectrum access , 2012, 2012 Proceedings IEEE INFOCOM.

[36]  Surya P. N. Singh,et al.  V-REP: A versatile and scalable robot simulation framework , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[37]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[38]  Fernando De la Torre,et al.  Supervised Descent Method and Its Applications to Face Alignment , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Michael Felsberg,et al.  The Visual Object Tracking VOT2013 Challenge Results , 2013, ICCV 2013.

[40]  David Kauchak,et al.  Improving Text Simplification Language Modeling Using Unsimplified Text Data , 2013, ACL.

[41]  Bruce A. Draper,et al.  The challenge of face recognition from digital point-and-shoot cameras , 2013, 2013 IEEE Sixth International Conference on Biometrics: Theory, Applications and Systems (BTAS).

[42]  Marc Toussaint,et al.  On Stochastic Optimal Control and Reinforcement Learning by Approximate Inference (Extended Abstract) , 2013, IJCAI.

[43]  Lin-Shan Lee,et al.  Interactive spoken content retrieval by extended query model and continuous state space Markov Decision Process , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[44]  Yi Wu,et al.  Online Object Tracking: A Benchmark , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[45]  Chih-Jen Lin,et al.  Large-Scale Video Summarization Using Web-Image Priors , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[46]  Igor V. Tetko,et al.  How Accurately Can We Predict the Melting Points of Drug-like Compounds? , 2014, J. Chem. Inf. Model..

[47]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[48]  Wenhan Dai,et al.  Online learning for multi-channel opportunistic access over unknown Markovian channels , 2014, 2014 Eleventh Annual IEEE International Conference on Sensing, Communication, and Networking (SECON).

[49]  John Kane,et al.  COVAREP — A collaborative voice analysis repository for speech technologies , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[50]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[51]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[52]  Luc Van Gool,et al.  Creating Summaries from User Videos , 2014, ECCV.

[53]  Biing-Hwang Juang,et al.  Recurrent deep neural networks for robust speech recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[54]  Oren Etzioni,et al.  Learning to Solve Arithmetic Word Problems with Verb Categorization , 2014, EMNLP.

[55]  Honglak Lee,et al.  Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning , 2014, NIPS.

[56]  George Papadatos,et al.  The ChEMBL bioactivity database: an update , 2013, Nucleic Acids Res..

[57]  Stuart Crozier,et al.  Fully automatic lesion segmentation in breast MRI using mean‐shift and graph‐cuts on a region adjacency graph , 2014, Journal of magnetic resonance imaging : JMRI.

[58]  Simone Calderara,et al.  Visual Tracking: An Experimental Survey , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[59]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[60]  Michael Felsberg,et al.  The Visual Object Tracking VOT2015 Challenge Results , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[61]  Emanuel Todorov,et al.  Ensemble-CIO: Full-body dynamic motion planning that transfers to physical humanoids , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[62]  Yale Song,et al.  Video co-summarization: Video summarization by visual co-occurrence , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[63]  Dan Roth,et al.  Reasoning about Quantities in Natural Language , 2015, TACL.

[64]  Chris Callison-Burch,et al.  Problems in Current Text Simplification Research: New Data Can Help , 2015, TACL.

[65]  Dan Roth,et al.  Solving General Arithmetic Word Problems , 2016, EMNLP.

[66]  Afshin Dehghan,et al.  Target Identity-aware Network Flow for online multiple target tracking , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[67]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[68]  Xinlei Chen,et al.  Microsoft COCO Captions: Data Collection and Evaluation Server , 2015, ArXiv.

[69]  Ming-Hsuan Yang,et al.  Object Tracking Benchmark , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[70]  Shuo Li,et al.  Multi-Modality Vertebra Recognition in Arbitrary Views Using 3D Deformable Hierarchical Model , 2015, IEEE Transactions on Medical Imaging.

[71]  Yale Song,et al.  TVSum: Summarizing web videos using titles , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[72]  Luc Van Gool,et al.  Video summarization by learning submodular mixtures of objectives , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[73]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[74]  Honglak Lee,et al.  Action-Conditional Video Prediction using Deep Networks in Atari Games , 2015, NIPS.

[75]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[76]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[77]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[78]  Wei-Ying Ma,et al.  How well do Computers Solve Math Word Problems? Large-Scale Dataset Construction and Evaluation , 2016, ACL.

[79]  Jianfeng Gao,et al.  Deep Reinforcement Learning for Dialogue Generation , 2016, EMNLP.

[80]  Srikanth Kandula,et al.  Resource Management with Deep Reinforcement Learning , 2016, HotNets.

[81]  Ke Zhang,et al.  Video Summarization with Long Short-Term Memory , 2016, ECCV.

[82]  Bohyung Han,et al.  Learning Multi-domain Convolutional Neural Networks for Visual Tracking , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[83]  Roy Fox,et al.  Taming the Noise in Reinforcement Learning via Soft Updates , 2015, UAI.

[84]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[85]  Walid Saad,et al.  Dynamic Proximity-Aware Resource Allocation in Vehicle-to-Vehicle (V2V) Communications , 2016, 2016 IEEE Globecom Workshops (GC Wkshps).

[86]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[87]  Michael Milford,et al.  Modular Deep Q Networks for Sim-to-real Transfer of Visuo-motor Policies , 2016, ICRA 2017.

[88]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[89]  Martial Hebert,et al.  Learning Transferable Policies for Monocular Reactive MAV Control , 2016, ISER.

[90]  Peter Szolovits,et al.  MIMIC-III, a freely accessible critical care database , 2016, Scientific Data.

[91]  Wojciech Jaskowski,et al.  ViZDoom: A Doom-based AI research platform for visual reinforcement learning , 2016, 2016 IEEE Conference on Computational Intelligence and Games (CIG).

[92]  Shane Legg,et al.  DeepMind Lab , 2016, ArXiv.

[93]  Douglas Eck,et al.  Generating Music by Fine-Tuning Recurrent Neural Networks with Reinforcement Learning , 2016 .

[94]  Katja Hofmann,et al.  The Malmo Platform for Artificial Intelligence Experimentation , 2016, IJCAI.

[95]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[96]  Alejandro Hernández Cordero,et al.  Extending the OpenAI Gym for robotics: a toolkit for reinforcement learning using ROS and Gazebo , 2016, ArXiv.

[97]  Peter Robinson,et al.  OpenFace: An open source facial behavior analysis toolkit , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[98]  Ke Zhang,et al.  Summary Transfer: Exemplar-Based Subset Selection for Video Summarization , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[99]  Louis-Philippe Morency,et al.  Multimodal Sentiment Intensity Analysis in Videos: Facial Gestures and Verbal Messages , 2016, IEEE Intelligent Systems.

[100]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[101]  Liang Lin,et al.  Attention-Aware Face Hallucination via Deep Reinforcement Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[102]  Elman Mansimov,et al.  Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation , 2017, NIPS.

[103]  Mirella Lapata,et al.  Sentence Simplification with Deep Reinforcement Learning , 2017, EMNLP.

[104]  Marc Peter Deisenroth,et al.  Deep Reinforcement Learning: A Brief Survey , 2017, IEEE Signal Processing Magazine.

[105]  Chris Sauer,et al.  Beating Atari with Natural Language Guided Reinforcement Learning , 2017, ArXiv.

[106]  Richard N. Zare,et al.  Optimizing Chemical Reactions with Deep Reinforcement Learning , 2017, ACS central science.

[107]  Tom Schaul,et al.  StarCraft II: A New Challenge for Reinforcement Learning , 2017, ArXiv.

[108]  Michael Lam,et al.  Unsupervised Video Summarization with Adversarial LSTM Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[109]  Nando de Freitas,et al.  Sample Efficient Actor-Critic with Experience Replay , 2016, ICLR.

[110]  Greg Turk,et al.  Preparing for the Unknown: Learning a Universal Policy with Online System Identification , 2017, Robotics: Science and Systems.

[111]  Lei Xu,et al.  Input Convex Neural Networks : Supplementary Material , 2017 .

[112]  Ali Farhadi,et al.  AI2-THOR: An Interactive 3D Environment for Visual AI , 2017, ArXiv.

[113]  Balaraman Ravindran,et al.  EPOpt: Learning Robust Neural Network Policies Using Model Ensembles , 2016, ICLR.

[114]  Yuxi Li,et al.  Deep Reinforcement Learning: An Overview , 2017, ArXiv.

[115]  Yang Li,et al.  Adaptive Neural Network Control of AUVs With Control Input Nonlinearities Using Reinforcement Learning , 2017, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[116]  Zhengyao Jiang,et al.  A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem , 2017, ArXiv.

[117]  Marc G. Bellemare,et al.  A Distributional Perspective on Reinforcement Learning , 2017, ICML.

[118]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[119]  Gustavo Carneiro,et al.  Deep Reinforcement Learning for Active Breast Lesion Detection from DCE-MRI , 2017, MICCAI.

[120]  Ning Zhang,et al.  Deep Reinforcement Learning-Based Image Captioning with Embedding Reward , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[121]  Yanli Wang,et al.  PubChem BioAssay: 2017 update , 2016, Nucleic Acids Res..

[122]  Danica Kragic,et al.  Deep predictive policy training using reinforcement learning , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[123]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[124]  Jiwen Lu,et al.  Attention-Aware Deep Reinforcement Learning for Video Face Recognition , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[125]  Sergey Levine,et al.  (CAD)$^2$RL: Real Single-Image Flight without a Single Real Image , 2016, Robotics: Science and Systems.

[126]  Richard Socher,et al.  A Deep Reinforced Model for Abstractive Summarization , 2017, ICLR.

[127]  Luc Van Gool,et al.  Viewpoint-Aware Video Summarization , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[128]  Marco Wiering,et al.  Actor-Critic Reinforcement Learning with Neural Networks in Continuous Games , 2018, ICAART.

[129]  Marcin Andrychowicz,et al.  Sim-to-Real Transfer of Robotic Control with Dynamics Randomization , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[130]  Geoffrey Ye Li,et al.  Power of Deep Learning for Channel Estimation and Signal Detection in OFDM Systems , 2017, IEEE Wireless Communications Letters.

[131]  Song Han,et al.  AMC: AutoML for Model Compression and Acceleration on Mobile Devices , 2018, ECCV.

[132]  Libin Liu,et al.  Learning basketball dribbling skills using trajectory optimization and deep reinforcement learning , 2018, ACM Trans. Graph..

[133]  Bin Zhao,et al.  HSA-RNN: Hierarchical Structure-Adaptive RNN for Video Summarization , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[134]  Interactive Spoken Content Retrieval by Deep Reinforcement Learning , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[135]  Marwan Mattar,et al.  Unity: A General Platform for Intelligent Agents , 2018, ArXiv.

[136]  Owain Evans,et al.  Trial without Error: Towards Safe Reinforcement Learning via Human Intervention , 2017, AAMAS.

[137]  Loïc Le Folgoc,et al.  Automatic View Planning with Multi-scale Deep Reinforcement Learning Agents , 2018, MICCAI.

[138]  Bhaskar Krishnamachari,et al.  Deep Reinforcement Learning for Dynamic Multichannel Access in Wireless Networks , 2018, IEEE Transactions on Cognitive Communications and Networking.

[139]  Samy Bengio,et al.  A Study on Overfitting in Deep Reinforcement Learning , 2018, ArXiv.

[140]  Geoffrey Ye Li,et al.  Deep Reinforcement Learning for Resource Allocation in V2V Communications , 2017, 2018 IEEE International Conference on Communications (ICC).

[141]  Peter Henderson,et al.  An Introduction to Deep Reinforcement Learning , 2018, Found. Trends Mach. Learn..

[142]  Shane Legg,et al.  Noisy Networks for Exploration , 2017, ICLR.

[143]  Amir Hussain,et al.  Applications of Deep Learning and Reinforcement Learning to Biological Data , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[144]  Olexandr Isayev,et al.  Deep reinforcement learning for de novo drug design , 2017, Science Advances.

[145]  Hasmat Malik,et al.  Application of Evolutionary Reinforcement Learning (ERL) Approach in Control Domain: A Review , 2019 .

[146]  Weinan Zhang,et al.  MAgent: A Many-Agent Reinforcement Learning Platform for Artificial Collective Intelligence , 2017, AAAI.

[147]  Ling Shao,et al.  Hyperparameter Optimization for Tracking with Continuous Deep Q-Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[148]  Sergey Levine,et al.  Using Simulation and Domain Adaptation to Improve Efficiency of Deep Robotic Grasping , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[149]  Anand Sriraman,et al.  Imitation Learning on Atari using Non-Expert Human Annotations , 2018, HCOMP.

[150]  Yuval Tassa,et al.  Safe Exploration in Continuous Action Spaces , 2018, ArXiv.

[151]  Shane Legg,et al.  IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[152]  Tsung-Hsien Wen,et al.  Interactive Spoken Content Retrieval by Deep Reinforcement Learning , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[153]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[154]  Jin Young Choi,et al.  Action-Driven Visual Object Tracking With Deep Reinforcement Learning , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[155]  Heng Tao Shen,et al.  MathDQN: Solving Arithmetic Word Problems via Deep Reinforcement Learning , 2018, AAAI.

[156]  Matthew J. Hausknecht,et al.  TextWorld: A Learning Environment for Text-based Games , 2018, CGW@IJCAI.

[157]  Yong Gao,et al.  Optimize taxi driving strategies based on reinforcement learning , 2018, Int. J. Geogr. Inf. Sci..

[158]  Fawzi Nashashibi,et al.  End-to-End Race Driving with Deep Reinforcement Learning , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[159]  Kaiyang Zhou,et al.  Deep Reinforcement Learning for Unsupervised Video Summarization with Diversity-Representativeness Reward , 2017, AAAI.

[160]  Chunfu Shao,et al.  Look-Ahead Insertion Policy for a Shared-Taxi System Based on Reinforcement Learning , 2018, IEEE Access.

[161]  Joelle Pineau,et al.  A Dissection of Overfitting and Generalization in Continuous Reinforcement Learning , 2018, ArXiv.

[162]  Yuval Tassa,et al.  DeepMind Control Suite , 2018, ArXiv.

[163]  Emanuel Todorov,et al.  Reinforcement learning for non-prehensile manipulation: Transfer from simulation to physical system , 2018, 2018 IEEE International Conference on Simulation, Modeling, and Programming for Autonomous Robots (SIMPAR).

[164]  Pieter Abbeel,et al.  Some Considerations on Learning to Explore via Meta-Reinforcement Learning , 2018, ICLR 2018.

[165]  Marcin Andrychowicz,et al.  Asymmetric Actor Critic for Image-Based Robot Learning , 2017, Robotics: Science and Systems.

[166]  Shuguang Cui,et al.  Handover Control in Wireless Systems via Asynchronous Multiuser Deep Reinforcement Learning , 2018, IEEE Internet of Things Journal.

[167]  Satinder Singh,et al.  Self-Imitation Learning , 2018, ICML.

[168]  Jitendra Malik,et al.  Gibson Env: Real-World Perception for Embodied Agents , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[169]  Sergey Levine,et al.  QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.

[170]  Anders Jonsson,et al.  Deep Reinforcement Learning in Medicine , 2018, Kidney Diseases.

[171]  Hung-yi Lee,et al.  Spoken SQuAD: A Study of Mitigating the Impact of Speech Recognition Errors on Listening Comprehension , 2018, INTERSPEECH.

[172]  Nicholas Jing Yuan,et al.  DRN: A Deep Reinforcement Learning Framework for News Recommendation , 2018, WWW.

[173]  Reazul Hasan Russel A Short Survey on Probabilistic Reinforcement Learning , 2019, ArXiv.

[174]  Shuicheng Yan,et al.  Revisiting Jump-Diffusion Process for Visual Tracking: A Reinforcement Learning Approach , 2019, IEEE Transactions on Circuits and Systems for Video Technology.

[175]  Wencong Su,et al.  Indirect Customer-to-Customer Energy Trading With Reinforcement Learning , 2019, IEEE Transactions on Smart Grid.

[176]  Lander Usategui San Juan,et al.  gym-gazebo2, a toolkit for reinforcement learning using ROS 2 and Gazebo , 2019, ArXiv.

[177]  Siqi Liu,et al.  Deep Reinforcement Learning for Clinical Decision Support: A Brief Survey , 2019, ArXiv.

[178]  Longbo Huang,et al.  A Deep Reinforcement Learning Framework for Rebalancing Dockless Bike Sharing Systems , 2018, AAAI.

[179]  Sergey Levine,et al.  Recall Traces: Backtracking Models for Efficient Reinforcement Learning , 2018, ICLR.

[180]  Madalina M. Drugan,et al.  Reinforcement learning versus evolutionary computation: A survey on hybrid algorithms , 2019, Swarm Evol. Comput..

[181]  Ying-Chang Liang,et al.  Deep Reinforcement Learning-Based Modulation and Coding Scheme Selection in Cognitive Heterogeneous Networks , 2018, IEEE Transactions on Wireless Communications.

[182]  Yang Wang,et al.  Video Summarization by Learning From Unpaired Data , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[183]  José R. Vázquez-Canteli,et al.  Reinforcement learning for demand response: A review of algorithms and modeling techniques , 2019, Applied Energy.

[184]  Wojciech Czarnecki,et al.  Multi-task Deep Reinforcement Learning with PopArt , 2018, AAAI.

[185]  Sergey Levine,et al.  Generalization through Simulation: Integrating Simulated and Real Data into Deep Reinforcement Learning for Vision-Based Autonomous Flight , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[186]  Ying-Chang Liang,et al.  Applications of Deep Reinforcement Learning in Communications and Networking: A Survey , 2018, IEEE Communications Surveys & Tutorials.

[187]  Martin A. Riedmiller,et al.  Self-supervised Learning of Image Embedding for Continuous Control , 2019, ArXiv.

[188]  Vaneet Aggarwal,et al.  DeepPool: Distributed Model-Free Algorithm for Ride-Sharing Using Deep Reinforcement Learning , 2019, IEEE Transactions on Intelligent Transportation Systems.

[189]  Jieping Ye,et al.  Optimizing Online Matching for Ride-Sourcing Services with Multi-Agent Deep Reinforcement Learning , 2019, ArXiv.

[190]  Cheng-Lin Liu,et al.  Decision Controller for Object Tracking With Deep Reinforcement Learning , 2019, IEEE Access.

[191]  Fei Wu,et al.  Deep Q Learning Driven CT Pancreas Segmentation With Geometry-Aware U-Net , 2019, IEEE Transactions on Medical Imaging.

[192]  Dorin Comaniciu,et al.  Multi-Scale Deep Reinforcement Learning for Real-Time 3D-Landmark Detection in CT Scans , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[193]  Yang Gao,et al.  Risk Averse Robust Adversarial Reinforcement Learning , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[194]  Yoshiharu Sato,et al.  Model-Free Reinforcement Learning for Financial Portfolios: A Brief Survey , 2019, ArXiv.

[195]  Jitendra Malik,et al.  Habitat: A Platform for Embodied AI Research , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[196]  Kobi Cohen,et al.  Deep Multi-User Reinforcement Learning for Distributed Dynamic Spectrum Access , 2017, IEEE Transactions on Wireless Communications.

[197]  Rosalind W. Picard,et al.  Deep Reinforcement Learning for Optimal Critical Care Pain Management with Morphine using Dueling Double-Deep Q Networks , 2019, 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[198]  Victor C. M. Leung,et al.  Buffer-Aware Streaming in Small-Scale Wireless Networks: A Deep Reinforcement Learning Approach , 2019, IEEE Transactions on Vehicular Technology.

[199]  Wolfram Burgard,et al.  VR-Goggles for Robots: Real-to-Sim Domain Adaptation for Visual Control , 2018, IEEE Robotics and Automation Letters.

[200]  William Koch,et al.  Flight Controller Synthesis Via Deep Reinforcement Learning , 2019, ArXiv.

[201]  Mohamed Saber Naceur,et al.  Seeking for Robustness in Reinforcement Learning: Application on Carla Simulator , 2019 .

[202]  Victor Talpaert,et al.  Exploring applications of deep reinforcement learning for real-world autonomous driving systems , 2019, VISIGRAPP.

[203]  Dhruv Ramani,et al.  A Short Survey On Memory Based Reinforcement Learning , 2019, ArXiv.

[204]  Tanmay Gangwani,et al.  Mutual Information Based Knowledge Transfer Under State-Action Dimension Mismatch , 2020, UAI.

[205]  M. Tomizuka,et al.  Guided Policy Search Model-based Reinforcement Learning for Urban Autonomous Driving , 2020, ArXiv.

[206]  Stefano Ermon,et al.  Learning When and Where to Zoom With Deep Reinforcement Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[207]  Gaurav Singal,et al.  Corridor segmentation for automatic robot navigation in indoor environment using edge devices , 2020, Comput. Networks.

[208]  Gaurav Singal,et al.  Coinnet: platform independent application to recognize Indian currency notes using deep learning techniques , 2020, Multimedia Tools and Applications.

[209]  Wenlong Huang,et al.  One Policy to Control Them All: Shared Modular Policies for Agent-Agnostic Control , 2020, ICML.

[210]  Junhyuk Oh,et al.  Discovering Reinforcement Learning Algorithms , 2020, NeurIPS.

[211]  H. Francis Song,et al.  The Hanabi Challenge: A New Frontier for AI Research , 2019, Artif. Intell..

[212]  Sergey Levine,et al.  Adversarial Policies: Attacking Deep Reinforcement Learning , 2019, ICLR.

[213]  Devesh K. Jha,et al.  Can Increasing Input Dimensionality Improve Deep Reinforcement Learning? , 2020, ICML.

[214]  Sarangapani Jagannathan,et al.  A comprehensive survey on model compression and acceleration , 2020, Artificial Intelligence Review.

[215]  Song Guo,et al.  Green Resource Allocation Based on Deep Reinforcement Learning in Content-Centric IoT , 2018, IEEE Transactions on Emerging Topics in Computing.

[216]  Wei Liu,et al.  End-to-End Active Object Tracking and Its Real-World Deployment via Reinforcement Learning , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[217]  J. Schulman,et al.  Leveraging Procedural Generation to Benchmark Reinforcement Learning , 2019, ICML.

[218]  Alessandra Sciutti,et al.  The Chef's Hat Simulation Environment for Reinforcement-Learning-Based Agents , 2020, ArXiv.

[219]  Madhushi Verma,et al.  A survey on Assistive Technology for visually impaired , 2020, Internet Things.

[220]  Evolve To Control: Evolution-based Soft Actor-Critic for Scalable Reinforcement Learning , 2020, ArXiv.

[221]  Eyke Hüllermeier,et al.  Towards a Scalable and Flexible Simulation and Testing Environment Toolbox for Intelligent Microgrid Control , 2020, ArXiv.

[222]  L. Tai,et al.  High-Speed Autonomous Drifting With Deep Reinforcement Learning , 2020, IEEE Robotics and Automation Letters.

[223]  Gaurav Agarwal,et al.  Deep Reinforcement Learning for Single-Shot Diagnosis and Adaptation in Damaged Robots , 2020, COMAD/CODS.

[224]  Il Dong Yun,et al.  Partial Policy-Based Reinforcement Learning for Anatomical Landmark Localization in 3D Medical Images , 2018, IEEE Transactions on Medical Imaging.

[225]  Gerrit Book,et al.  Toward a Reinforcement Learning Environment Toolbox for Intelligent Electric Motor Control , 2019, IEEE Transactions on Neural Networks and Learning Systems.