Multi-objective optimization of radiotherapy: distributed Q-learning and agent-based simulation

Abstract Radiotherapy (RT) is among the regular techniques for the treatment of cancerous tumours. Many of cancer patients are treated by this manner. Treatment planning is the most important phase in RT and it plays a key role in therapy quality achievement. As the goal of RT is to irradiate the tumour with adequately high levels of radiation while sparing neighbouring healthy tissues as much as possible, it is a multi-objective problem naturally. In this study, we propose an agent-based model of vascular tumour growth and also effects of RT. Next, we use multi-objective distributed Q-learning algorithm to find Pareto-optimal solutions for calculating RT dynamic dose. We consider multiple objectives and each group of optimizer agents attempt to optimise one of them, iteratively. At the end of each iteration, agents compromise the solutions to shape the Pareto-front of multi-objective problem. We propose a new approach by defining three schemes of treatment planning created based on different combinations of our objectives namely invasive, conservative and moderate. In invasive scheme, we enforce killing cancer cells and pay less attention about irradiation effects on normal cells. In conservative scheme, we take more care of normal cells and try to destroy cancer cells in a less stressed manner. The moderate scheme stands in between. For implementation, each of these schemes is handled by one agent in MDQ-learning algorithm and the Pareto optimal solutions are discovered by the collaboration of agents. By applying this methodology, we could reach Pareto treatment plans through building different scenarios of tumour growth and RT. The proposed multi-objective optimisation algorithm generates robust solutions and finds the best treatment plan for different conditions.

[1]  Stanley H Benedict,et al.  Review of Radiation Oncology Physics: A Handbook for Teachers and Students , 2004 .

[2]  David L Craft,et al.  Approximating convex pareto surfaces in multiobjective radiotherapy planning. , 2006, Medical physics.

[3]  Konstantina S. Nikita,et al.  A computer simulation of in vivo tumour growth and response to radiotherapy: New algorithms and parametric results , 2006, Comput. Biol. Medicine.

[4]  Thomas E Yankeelov,et al.  Clinically Relevant Modeling of Tumor Growth and Treatment Response , 2013, Science Translational Medicine.

[5]  M Kim,et al.  A Markov decision process approach to temporal modulation of dose fractions in radiation therapy planning , 2009, Physics in medicine and biology.

[6]  Georgios S Stamatakos,et al.  A spatio-temporal simulation model of the response of solid tumours to radiotherapy in vivo: parametric validation concerning oxygen enhancement ratio and cell cycle duration. , 2004, Physics in medicine and biology.

[7]  Avishai Sadan,et al.  Clinically relevant. , 2005, Quintessence international.

[8]  Michael Orth,et al.  Current concepts in clinical radiation oncology , 2013, Radiation and Environmental Biophysics.

[9]  Walid Gomaa,et al.  Multi-objective traffic light control system based on Bayesian probability interpretation , 2012, 2012 15th International IEEE Conference on Intelligent Transportation Systems.

[10]  Albert van der Kogel,et al.  Basic Clinical Radiobiology Fourth Edition , 2009 .

[11]  E. Hall,et al.  Radiobiology for the radiologist , 1973 .

[12]  Konstantina S. Nikita,et al.  In silico radiation oncology: combining novel simulation algorithms with current visualization techniques , 2002, Proc. IEEE.

[13]  Alexander R. A. Anderson,et al.  A model of breast carcinogenesis and recurrence after radiotherapy , 2007 .

[14]  Kaisa Miettinen,et al.  Nonlinear Interactive Multiobjective Optimization Method for Radiotherapy Treatment Planning with Boltzmann Transport Equation , 2009 .

[15]  Robert H. Sloan,et al.  Reinforcement learning via approximation of the Q-function , 2010, J. Exp. Theor. Artif. Intell..

[16]  Mohamed A. Khamis,et al.  Adaptive multi-objective reinforcement learning with hybrid exploration for traffic signal control based on cooperative multi-agent framework , 2014, Eng. Appl. Artif. Intell..

[17]  Bibhas Chakraborty,et al.  Q‐learning for estimating optimal dynamic treatment rules from observational data , 2012, The Canadian journal of statistics = Revue canadienne de statistique.

[18]  P Vaupel,et al.  Oxygen diffusivity in tumor tissue (DS-carcinosarcoma) under temperature conditions within the range of 20--40 degrees C. , 1977, Pflugers Archiv : European journal of physiology.

[19]  Christian P Karger,et al.  Single-cell-based computer simulation of the oxygen-dependent tumour response to irradiation , 2007, Physics in medicine and biology.

[20]  C. E. Mariano,et al.  Distributed reinforcement learning for multiple objective optimization problems , 2000, Proceedings of the 2000 Congress on Evolutionary Computation. CEC00 (Cat. No.00TH8512).

[21]  L. Wein,et al.  Dynamic optimization of a linear-quadratic model with incomplete repair and volume-dependent sensitivity and repopulation. , 2000, International journal of radiation oncology, biology, physics.

[22]  Alexander G. Fletcher,et al.  Multiscale modeling of colonic crypts and early colorectal cancer , 2010 .

[23]  Archis Ghate,et al.  Dynamic Optimization in Radiotherapy , 2011 .

[24]  E. Hammond,et al.  Targeting Hypoxic Cells through the DNA Damage Response , 2010, Clinical Cancer Research.

[25]  Peter Peschke,et al.  Modeling and Computer Simulations of Tumor Growth and Tumor Response to Radiotherapy , 2004, Radiation research.

[26]  S. Murphy,et al.  Optimal dynamic treatment regimes , 2003 .

[27]  Michael C. Ferris,et al.  Radiosurgery Treatment Planning via Nonlinear Programming , 2003, Ann. Oper. Res..

[28]  Abbas Ahmadi,et al.  Simulation-based optimization of radiotherapy: Agent-based modeling and reinforcement learning , 2017, Math. Comput. Simul..

[29]  M Phillips,et al.  Adaptive IMRT using a multiobjective evolutionary algorithm integrated with a diffusion–invasion model of glioblastoma , 2012, Physics in medicine and biology.

[30]  M R Kosorok,et al.  Penalized Q-Learning for Dynamic Treatment Regimens. , 2011, Statistica Sinica.

[31]  Fernando Alonso,et al.  A new concept for interactive radiotherapy planning with multicriteria optimization: first clinical evaluation. , 2007, Radiotherapy and oncology : journal of the European Society for Therapeutic Radiology and Oncology.

[32]  Subhadip Paul,et al.  MRI-image based radiotherapy treatment optimization of brain tumours using stochastic approach , 2003 .

[33]  M Tatcher,et al.  [Fractionation in radiotherapy]. , 1972, Harefuah.

[34]  Norman F. Kirkby,et al.  Mathematical modelling of the response of tumour cells to radiotherapy , 2002 .

[35]  B. Ross,et al.  Mathematical Modeling of PDGF-Driven Glioblastoma Reveals Optimized Radiation Dosing Schedules , 2014, Cell.

[36]  P. Vaupel,et al.  Oxygen diffusivity in tumor tissue (DS-Carcinosarcoma) under temperature conditions within the range of 20–40°C , 1977, Pflügers Archiv.

[37]  Carlo C. Maley,et al.  Solving the Puzzle of Metastasis: The Evolution of Cell Migration in Neoplasms , 2011, PloS one.

[38]  Adrian Fleet,et al.  Radiobiology for the Radiologist: 6th edition, Eric J. Hall, Amato J. Giaccia, Lippincott Williams and Wilkins Publishing; ISBN 0-7817-4151-3; 656 pages; 2006; Hardback; £53 , 2006, Journal of Radiotherapy in Practice.

[39]  Jan C. Thiele R Marries NetLogo: Introduction to the RNetLogo Package , 2014 .

[40]  Abbas Ahmadi,et al.  Intelligent breast cancer recognition using particle swarm optimization and support vector machines , 2016, J. Exp. Theor. Artif. Intell..

[41]  Nicole O'Neil,et al.  An Agent Based Model of Tumor Growth and Response to Radiotherapy , 2012 .

[42]  Dewen Hu,et al.  Multiobjective Reinforcement Learning: A Comprehensive Overview , 2015, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[43]  Mohamed A. Khamis,et al.  Enhanced multiagent multi-objective reinforcement learning for urban traffic light control , 2012, 2012 11th International Conference on Machine Learning and Applications.

[44]  Michael C. Ferris,et al.  Neuro-dynamic programming for fractionated radiotherapy planning , 2008 .

[45]  Timothy Davison,et al.  Adaptive agent abstractions to speed up spatial agent-based simulations , 2014, Simul. Model. Pract. Theory.

[46]  Gibin G Powathil,et al.  Modelling the effects of cell-cycle heterogeneity on the response of a solid tumour to chemotherapy: biological insights from a hybrid multiscale cellular automaton model. , 2012, Journal of theoretical biology.

[47]  Timothy C. Y. Chan,et al.  Optimization under uncertainty in radiation therapy , 2007 .

[48]  M. Kosorok,et al.  Reinforcement learning design for cancer clinical trials , 2009, Statistics in medicine.

[49]  Dávid Papp,et al.  Shared data for intensity modulated radiation therapy (IMRT) optimization research: the CORT dataset , 2014, GigaScience.

[50]  Rolando Placeres Jiménez,et al.  Tumour–host dynamics under radiotherapy , 2011 .

[51]  Jagdish Ramakrishnan,et al.  Dynamic optimization of fractionation schedules in radiation therapy , 2013 .

[52]  P Okunieff,et al.  Blood flow, oxygen consumption and tissue oxygenation of human tumors. , 1990, Advances in experimental medicine and biology.

[53]  Michael C. Joiner,et al.  Comprar Basic Clinical Radiobiology | Albert van der Kogel | 9780340929667 | Hodder Arnold , 2009 .