Reinforcement Learning under Model Risk for Biomanufacturing Fermentation Control

In the biopharmaceutical manufacturing, fermentation process plays a critical role impacting on productivity and profit. Since biotherapeutics are manufactured in living cells whose biological mechanisms are complex and have highly variable outputs, in this paper, we introduce a model-based reinforcement learning framework accounting for model risk to support bioprocess online learning and guide the optimal and robust customized stopping policy for fermentation process. Specifically, built on the dynamic mechanisms of protein and impurity generation, we first construct a probabilistic model characterizing the impact of underlying bioprocess stochastic uncertainty on impurity and protein growth rates. Since biopharmaceutical manufacturing often has very limited batch data during the development and early stage of production, we derive the posterior distribution quantifying the process model risk, and further develop the Bayesian rule based knowledge update to support the online learning on underlying stochastic process. With the prediction risk accounting for both bioprocess stochastic uncertainty and model risk, the proposed reinforcement learning framework can proactively hedge all sources of uncertainties and support the optimal and robust customized decision making. We conduct the structural analysis of optimal policy and study the impact of model risk on the policy selection. We can show that it asymptotically converges to the optimal policy obtained under perfect information of underlying stochastic process. Our case studies demonstrate that the proposed framework can greatly improve the biomanufacturing industrial practice.

[1]  M. A. Jordan Bioprocess engineering principles , 1996 .

[2]  Krist V. Gernaey,et al.  A review of control strategies for manipulating the feed rate in fed-batch fermentation processes. , 2017, Journal of biotechnology.

[3]  Tao Wang,et al.  Bayesian sparse sampling for on-line reward optimization , 2005, ICML.

[4]  Jiansheng Peng,et al.  Time-dependent fermentation control strategies for enhancing synthesis of marine bacteriocin 1701 using artificial neural network and genetic algorithm. , 2013, Bioresource technology.

[5]  Z. Soons,et al.  Constant specific growth rate in fed-batch cultivation of Bordetella pertussis using adaptive control. , 2006, Journal of biotechnology.

[6]  Andrew Y. Ng,et al.  Near-Bayesian exploration in polynomial time , 2009, ICML '09.

[7]  Benjamin Van Roy,et al.  (More) Efficient Reinforcement Learning via Posterior Sampling , 2013, NIPS.

[8]  Bernhard Sonnleitner,et al.  Controlled fed-batch by tracking the maximal culture capacity. , 2007, Journal of biotechnology.

[9]  Jesse Hoey,et al.  An analytic solution to discrete Bayesian reinforcement learning , 2006, ICML.

[10]  M. Littman,et al.  Approaching Bayes-optimalilty using Monte-Carlo tree search , 2011 .

[11]  Lihong Li,et al.  A Bayesian Sampling Approach to Exploration in Reinforcement Learning , 2009, UAI.

[12]  Ananth Krishnamurthy,et al.  Optimal condition-based harvesting policies for biomanufacturing operations with failure risks , 2016 .

[13]  Anurag S. Rathore,et al.  Reinforcement learning based optimization of process chromatography for continuous processing of biopharmaceuticals , 2021, Chemical Engineering Science.

[14]  Rimvydas Simutis,et al.  Improving the batch-to-batch reproducibility of microbial cultures during recombinant protein production by regulation of the total carbon dioxide production. , 2007, Journal of biotechnology.

[15]  John K Kruschke,et al.  Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[16]  W. N. Street,et al.  Financial Data and the Skewed Generalized T Distribution , 1998 .

[17]  Barry Lennox,et al.  Multivariate batch to batch optimisation of fermentation processes incorporating validity constraints , 2016 .

[18]  Joelle Pineau,et al.  Model-Based Bayesian Reinforcement Learning in Large Structured Domains , 2008, UAI.

[19]  Joelle Pineau,et al.  Bayesian reinforcement learning in continuous POMDPs with application to robot navigation , 2008, 2008 IEEE International Conference on Robotics and Automation.

[20]  Richard D. Braatz,et al.  Control systems technology in the advanced manufacturing of biologic drugs , 2015, 2015 IEEE Conference on Control Applications (CCA).

[21]  C. Herwig,et al.  Efficient feeding profile optimization for recombinant protein production using physiological information , 2012, Bioprocess and Biosystems Engineering.

[22]  Lucian Busoniu,et al.  Optimistic planning for belief-augmented Markov Decision Processes , 2013, 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).

[23]  Dale E Seborg,et al.  Fault Detection and Diagnosis in an Industrial Fed‐Batch Cell Culture Process , 2007, Biotechnology progress.

[24]  Liang Chang,et al.  Nonlinear model predictive control of fed-batch fermentations using dynamic flux balance models , 2016 .

[25]  Brian Glennon,et al.  Glucose concentration control of a fed-batch mammalian cell bioprocess using a nonlinear model predictive controller , 2014 .

[26]  Yishay Mansour,et al.  A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.

[27]  Chris P. Barnes,et al.  Deep reinforcement learning for the control of microbial co-cultures in bioreactors , 2020, PLoS Comput. Biol..

[28]  Hisbullah,et al.  Design of a Fuzzy Logic Controller for Regulating Substrate Feed to Fed-Batch Fermentation , 2003 .

[29]  Richard D. Braatz,et al.  Challenges and opportunities in biopharmaceutical manufacturing control , 2018, Comput. Chem. Eng..

[30]  Matthew C Coleman,et al.  Retrospective optimization of time‐dependent fermentation control strategies using time‐independent historical data , 2006, Biotechnology and bioengineering.

[31]  Ananth Krishnamurthy,et al.  Managing Trade-offs in Protein Manufacturing: How Much to Waste? , 2020, Manuf. Serv. Oper. Manag..

[32]  Andrew G. Barto,et al.  Optimal learning: computational procedures for bayes-adaptive markov decision processes , 2002 .

[33]  Ulf Schrader,et al.  From Science to Operations Questions , Choices and Strategies for Success in Biopharma , 2014 .

[34]  Shie Mannor,et al.  Bayesian Reinforcement Learning: A Survey , 2015, Found. Trends Mach. Learn..

[35]  Richard D. Braatz,et al.  pH and conductivity control in an integrated biomanufacturing plant , 2016, 2016 American Control Conference (ACC).

[36]  Ananth Krishnamurthy,et al.  Performance Guarantees and Optimal Purification Decisions for Engineered Proteins , 2017, Oper. Res..