The potential of random forest and neural networks for biomass and recombinant protein modeling in Escherichia coli fed‐batch fermentations

Product quality assurance strategies in production of biopharmaceuticals currently undergo a transformation from empirical “quality by testing” to rational, knowledge‐based “quality by design” approaches. The major challenges in this context are the fragmentary understanding of bioprocesses and the severely limited real‐time access to process variables related to product quality and quantity. Data driven modeling of process variables in combination with model predictive process control concepts represent a potential solution to these problems. The selection of statistical techniques best qualified for bioprocess data analysis and modeling is a key criterion. In this work a series of recombinant Escherichia coli fed‐batch production processes with varying cultivation conditions employing a comprehensive on‐ and offline process monitoring platform was conducted. The applicability of two machine learning methods, random forest and neural networks, for the prediction of cell dry mass and recombinant protein based on online available process parameters and two‐dimensional multi‐wavelength fluorescence spectroscopy is investigated. Models solely based on routinely measured process variables give a satisfying prediction accuracy of about ± 4% for the cell dry mass, while additional spectroscopic information allows for an estimation of the protein concentration within ± 12%. The results clearly argue for a combined approach: neural networks as modeling technique and random forest as variable selection tool.

[1]  Gerald Striedner,et al.  Implementation of proton transfer reaction‐mass spectrometry (PTR‐MS) for advanced bioprocess monitoring , 2012, Biotechnology and bioengineering.

[2]  K. Bayer,et al.  An advanced monitoring platform for rational design of recombinant processes. , 2013, Advances in biochemical engineering/biotechnology.

[3]  Gürkan Sin,et al.  Application of mechanistic models to fermentation and biocatalysis for next-generation processes. , 2010, Trends in biotechnology.

[4]  International Conference on Harmonisation; guidance on Q10 Pharmaceutical Quality System; availability. Notice. , 2009, Federal register.

[5]  Lawrence X. Yu Pharmaceutical Quality by Design: Product and Process Development, Understanding, and Control , 2008, Pharmaceutical Research.

[6]  Sascha Beutel,et al.  In situ sensor techniques in modern bioprocess monitoring , 2011, Applied Microbiology and Biotechnology.

[7]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[8]  Bogdan Gabrys,et al.  Data-driven Soft Sensors in the process industry , 2009, Comput. Chem. Eng..

[9]  Linda M. Harvey,et al.  At-line monitoring of ammonium, glucose, methyl oleate and biomass in a complex antibiotic fermentation process using attenuated total reflectance-mid-infrared (ATR-MIR) spectroscopy , 2006 .

[10]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[11]  S. Wold,et al.  PLS-regression: a basic tool of chemometrics , 2001 .

[12]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[13]  Jarka Glassey,et al.  Multivariate data analysis for advancing the interpretation of bioprocess measurement and monitoring data. , 2013, Advances in biochemical engineering/biotechnology.

[14]  Max Kuhn,et al.  Applied Predictive Modeling , 2013 .

[15]  Guidance for Industry PAT — A Framework for Innovative Pharmaceutical Development , Manufacturing , and Quality Assurance , 2004 .

[16]  J E Bailey,et al.  Mathematical Modeling and Analysis in Biochemical Engineering: Past Accomplishments and Future Opportunities , 1998, Biotechnology progress.

[17]  International Conference on Harmonisation; guidance on Q9 Quality Risk Management; availability. Notice. , 2006, Federal register.

[18]  International Conference on Harmonisation; guidance on Q8(R1) Pharmaceutical Development; addition of annex; availability. Notice. , 2009, Federal register.

[19]  Gürkan Sin,et al.  Application of modeling and simulation tools for the evaluation of biocatalytic processes: A future perspective , 2009, Biotechnology progress.

[20]  Franz Clementschitsch,et al.  Sensor combination and chemometric modelling for improved process monitoring in recombinant E. coli fed-batch cultivations. , 2005, Journal of biotechnology.

[21]  O. Kvalheim,et al.  Multivariate data analysis in pharmaceutics: a tutorial review. , 2011, International journal of pharmaceutics.

[22]  Bernd Hitzmann,et al.  2D-fluorescence and multivariate data analysis for monitoring of sourdough fermentation process , 2014 .

[23]  Joaquim M. S. Cabral,et al.  Ex situ bioprocess monitoring techniques , 2007 .

[24]  Christina Gloeckner,et al.  Modern Applied Statistics With S , 2003 .

[25]  Oxana Ye. Rodionova,et al.  Process analytical technology: a critical view of the chemometricians , 2012 .

[26]  Ignacio E. Grossmann,et al.  Mathematical programming approaches to the synthesis of chemical process systems , 1999 .

[27]  Joaquim M. S. Cabral,et al.  Real-time bioprocess monitoring: Part I: In situ sensors , 2006 .

[28]  Anurag S Rathore,et al.  Roadmap for implementation of quality by design (QbD) for biotechnology products. , 2009, Trends in biotechnology.

[29]  Jong Il Rhee,et al.  On-line process monitoring and chemometric modeling with 2D fluorescence spectra obtained in recombinant E. coli fermentations , 2007 .