Batch Mode TD($\lambda$ ) for Controlling Partially Observable Gene Regulatory Networks

External control of gene regulatory networks (GRNs) has received much attention in recent years. The aim is to find a series of actions to apply to a gene regulation system making it avoid its diseased states. In this work, we propose a novel method for controlling partially observable GRNs combining batch mode reinforcement learning (Batch RL) and TD(<inline-formula><tex-math notation="LaTeX">$\lambda$</tex-math><alternatives> <inline-graphic xlink:href="polat-ieq2-2595577.gif"/></alternatives></inline-formula>) algorithms. Unlike the existing studies inferring a computational model from gene expression data, and obtaining a control policy over the constructed model, our idea is to interpret the time series gene expression data as a sequence of observations that the system produced, and obtain an approximate stochastic policy directly from the gene expression data without estimation of the internal states of the partially observable environment. Thereby, we get rid of the most time consuming phases of the existing studies, inferring a model and running the model for the control. Results show that our method is able to provide control solutions for regulation systems of several thousands of genes only in seconds, whereas existing studies cannot solve control problems of even a few dozens of genes. Results also show that our approximate stochastic policies are almost as good as the policies generated by the existing studies.

[1]  Hidde de Jong,et al.  Modeling and Simulation of Genetic Regulatory Systems: A Literature Review , 2002, J. Comput. Biol..

[2]  Michael I. Jordan,et al.  Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.

[3]  E. Dougherty,et al.  Gene perturbation and intervention in probabilistic Boolean networks. , 2002, Bioinformatics.

[4]  Andrew McCallum,et al.  Instance-Based State Identification for Reinforcement Learning , 1994, NIPS.

[5]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[6]  Michael P. Verdicchio,et al.  Template-based intervention in Boolean network models of biological systems , 2014, EURASIP J. Bioinform. Syst. Biol..

[7]  Nikos A. Vlassis,et al.  Perseus: Randomized Point-based Value Iteration for POMDPs , 2005, J. Artif. Intell. Res..

[8]  A. Datta,et al.  On approximate stochastic control in genetic regulatory networks. , 2007, IET systems biology.

[9]  Aurélien Naldi,et al.  Dynamically consistent reduction of logical regulatory graphs , 2011, Theor. Comput. Sci..

[10]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[11]  Reda Alhajj,et al.  The Benefit of Decomposing POMDP for Control of Gene Regulatory Networks , 2011, 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology.

[12]  Edward R. Dougherty,et al.  Probabilistic Boolean networks: a rule-based uncertainty model for gene regulatory networks , 2002, Bioinform..

[13]  Peter Vrancx,et al.  Reinforcement Learning: State-of-the-Art , 2012 .

[14]  Longxin Lin Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching , 2004, Machine Learning.

[15]  Michael P. Verdicchio,et al.  Identifying Targets for intervention by Analyzing Basins of Attraction , 2011, Pacific Symposium on Biocomputing.

[16]  Reda Alhajj,et al.  Employing Batch Reinforcement Learning to Control Gene Regulation Without Explicitly Constructing Gene Regulatory Networks , 2013, IJCAI.

[17]  Pierre Geurts,et al.  Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..

[18]  Dirk Ormoneit,et al.  Kernel-Based Reinforcement Learning , 2017, Encyclopedia of Machine Learning and Data Mining.

[19]  Daniel Bryce,et al.  Planning interventions in biological networks , 2010, TIST.

[20]  Edward R. Dougherty,et al.  CAN MARKOV CHAIN MODELS MIMIC BIOLOGICAL REGULATION , 2002 .

[21]  Aniruddha Datta,et al.  Optimal infinite horizon control for probabilistic Boolean networks , 2006, 2006 American Control Conference.

[22]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[23]  Aniruddha Datta,et al.  On Reinforcement Learning in Genetic Regulatory Networks , 2007, 2007 IEEE/SP 14th Workshop on Statistical Signal Processing.

[24]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[25]  Marco Wiering,et al.  Reinforcement Learning , 2014, Adaptation, Learning, and Optimization.

[26]  E. Dougherty,et al.  CONTROL OF STATIONARY BEHAVIOR IN PROBABILISTIC BOOLEAN NETWORKS BY MEANS OF STRUCTURAL INTERVENTION , 2002 .

[27]  Mauricio Barahona,et al.  Toggling a Genetic Switch Using Reinforcement Learning , 2013, ArXiv.

[28]  Andrew McCallum,et al.  Instance-Based Utile Distinctions for Reinforcement Learning with Hidden State , 1995, ICML.

[29]  Michael I. Jordan,et al.  Learning Without State-Estimation in Partially Observable Markovian Decision Processes , 1994, ICML.

[30]  Aniruddha Datta,et al.  External control in Markovian genetic regulatory networks: the imperfect information case , 2004, Bioinform..

[31]  Andrew McCallum,et al.  Overcoming Incomplete Perception with Utile Distinction Memory , 1993, ICML.

[32]  Xin Liu,et al.  Dynamical and Structural Analysis of a T Cell Survival Network Identifies Novel Candidate Therapeutic Targets for Large Granular Lymphocyte Leukemia , 2011, PLoS Comput. Biol..

[33]  Alexander J. Hartemink,et al.  Informative Structure Priors: Joint Learning of Dynamic Regulatory Networks from Multiple Types of Data , 2004, Pacific Symposium on Biocomputing.

[34]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[35]  N. Sampas,et al.  Molecular classification of cutaneous malignant melanoma by gene expression profiling , 2000, Nature.

[36]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[37]  A. Datta,et al.  External Control in Markovian Genetic Regulatory Networks , 2003, Proceedings of the 2003 American Control Conference, 2003..

[38]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[39]  Bart De Schutter,et al.  Reinforcement Learning and Dynamic Programming Using Function Approximators , 2010 .