Improving Efficiency of Training a Virtual Treatment Planner Network via Knowledge-guided Deep Reinforcement Learning for Intelligent Automatic Treatment Planning of Radiotherapy.

PURPOSE We previously proposed an intelligent automatic treatment planning framework for radiotherapy, in which a virtual treatment planner network (VTPN) is built using deep reinforcement learning (DRL) to operate a treatment planning system (TPS) by adjusting treatment planning parameters in it to generate high-quality plans. We demonstrated the potential feasibility of this idea in prostate cancer intensity-modulated radiation therapy (IMRT). Despite the success, the process to train a VTPN via the standard DRL approach with an ε-greedy algorithm was time consuming. The required training time was expected to grow with the complexity of the treatment planning problem, preventing the development of VTPN for more complicated but clinically relevant scenarios. In this study, we proposed a novel knowledge-guided DRL (KgDRL) approach that incorporated knowledge from human planners to guide the training process to improve the efficiency of training a VTPN. METHOD Using prostate cancer IMRT as a testbed, we first summarized a number of rules in the actions of adjusting treatment planning parameters of our in-house TPS. During the training process of VTPN, in addition to randomly navigating the large state-action space, as in the standard DRL approach using the ε-greedy algorithm, we also sampled actions defined by the rules. The priority of sampling actions from rules decreased over the training process to encourage VTPN to explore new policy on parameter adjustment that were not covered by the rules. To test this idea, we trained a VTPN using KgDRL and compared its performance with another VTPN trained using the standard DRL approach. Both networks were trained using 10 training patient cases and 5 additional cases for validation, while another 59 cases were employed for the evaluation purpose. RESULTS It was found that both VTPNs trained via KgDRL and standard DRL spontaneously learned how to operate the in-house TPS to generate high-quality plans, achieving plan quality scores of 8.82 (±0.29) and 8.43 (±0.48), respectively. Both VTPNs outperformed treatment planning purely based on the rules, which had a plan score of 7.81 (±1.59). VTPN trained with eight episodes using KgDRL was able to perform similarly to that trained using DRL with 100 epochs. The training time was reduced from more than a week to ~13 hours. CONCLUSION The proposed KgDRL framework was effective in accelerating the training process of a VTPN by incorporating human knowledge, which will facilitate the development of VTPN for more complicated treatment planning scenarios.

[1]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[2]  Timothy C. Y. Chan,et al.  Generalized Inverse Multiobjective Optimization with Application to Cancer Therapy , 2014, Oper. Res..

[3]  X. Wu,et al.  An optimization method for importance factors and beam weights based on genetic algorithms for radiotherapy treatment planning. , 2001, Physics in medicine and biology.

[4]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[5]  Justin J Boutilier,et al.  Models for predicting objective function weights in prostate cancer IMRT. , 2015, Medical physics.

[6]  Steve B. Jiang,et al.  Intelligent Parameter Tuning in Optimization-Based Iterative CT Reconstruction via Deep Reinforcement Learning , 2017, IEEE Transactions on Medical Imaging.

[7]  Timothy C. Y. Chan,et al.  Automated Treatment Planning in Radiation Therapy using Generative Adversarial Networks , 2018, MLHC.

[8]  Steve B. Jiang,et al.  Generating Pareto Optimal Dose Distributions for Radiation Therapy Treatment Planning , 2019, MICCAI.

[9]  Hui Yan,et al.  Fuzzy logic guided inverse treatment planning. , 2003, Medical physics.

[10]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[11]  Jiawei Fan,et al.  Automatic treatment planning based on three‐dimensional dose distribution predicted from deep learning technique , 2018, Medical physics.

[12]  A L Boyer,et al.  Optimization of importance factors in inverse planning. , 1999, Physics in medicine and biology.

[13]  Minsun Kim,et al.  A hierarchical evolutionary algorithm for multiobjective optimization in IMRT. , 2010, Medical physics.

[14]  Zhi Zhang,et al.  Knowledge-Guided Deep Fractal Neural Networks for Human Pose Estimation , 2017, IEEE Transactions on Multimedia.

[15]  Cui Tao,et al.  Towards improving diagnosis of skin diseases by combining deep neural network and human knowledge , 2018, BMC Medical Informatics and Decision Making.

[16]  Lei Xing,et al.  Development of an autonomous treatment planning strategy for radiation therapy with effective use of population‐based prior data , 2017, Medical physics.

[17]  Jie Yang,et al.  Reduced-order parameter optimization for simplifying prostate IMRT planning , 2007, Physics in medicine and biology.

[18]  Lei Xing,et al.  Inverse treatment planning with adaptively evolving voxel-dependent penalty scheme. , 2004, Medical physics.

[19]  Peter Ziegenhein,et al.  Physically constrained voxel‐based penalty adaptation for ultra‐fast IMRT planning , 2016, Journal of applied clinical medical physics.

[20]  Steve B. Jiang,et al.  Intelligent inverse treatment planning via deep reinforcement learning, a proof-of-principle study in high dose-rate brachytherapy for cervical cancer , 2018, Physics in medicine and biology.

[21]  Hui Yan,et al.  AI-guided parameter optimization in inverse treatment planning , 2003, Physics in medicine and biology.

[22]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[23]  Hui Yan,et al.  Application of distance transformation on parameter optimization of inverse planning in intensity‐modulated radiation therapy , 2008, Journal of applied clinical medical physics.

[24]  R. Glowinski,et al.  Augmented Lagrangian and Operator-Splitting Methods in Nonlinear Mechanics , 1987 .

[25]  Steve B. Jiang,et al.  An introduction to deep learning in medical physics: advantages, potential, and challenges , 2020, Physics in medicine and biology.

[26]  Tapani Raiko,et al.  European conference on machine learning and knowledge discovery in databases , 2014 .

[27]  Indra J. Das,et al.  Intensity-Modulated Radiation Therapy Dose Prescription, Recording, and Delivery: Patterns of Variability Among Institutions and Treatment Planning Systems , 2008 .

[28]  Markus Wulfmeier,et al.  Maximum Entropy Deep Inverse Reinforcement Learning , 2015, 1507.04888.

[29]  Richard Bellman,et al.  DYNAMIC PROGRAMMING: A BIBLIOGRAPHY OF THEORY AND APPLICATION , 1964 .

[30]  Steve B. Jiang,et al.  Incorporating human and learned domain knowledge into training deep neural networks: A differentiable dose volume histogram and adversarial inspired framework for generating Pareto optimal dose distributions in radiation therapy , 2019, Medical physics.

[31]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[32]  Timothy C Y Chan,et al.  Predicting objective function weights from patient anatomy in prostate IMRT treatment planning. , 2013, Medical physics.

[33]  Minsun Kim,et al.  The use of a multiobjective evolutionary algorithm to increase flexibility in the search for better IMRT plans. , 2012, Medical physics.

[34]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[35]  Steve B. Jiang,et al.  Operating a Treatment Planning System using a Deep-Reinforcement-Learning based Virtual Treatment Planner for Prostate Cancer Intensity-Modulated Radiation Therapy Treatment Planning. , 2020, Medical physics.

[36]  Jaegul Choo,et al.  Visual Analytics for Explainable Deep Learning , 2018, IEEE Computer Graphics and Applications.

[37]  James Wheeler,et al.  Variation in external beam treatment plan quality: An inter-institutional study of planners and planning systems. , 2012, Practical radiation oncology.