Harnessing the potential of machine learning for advancing “Quality by Design” in biomanufacturing

ABSTRACT Ensuring consistent high yields and product quality are key challenges in biomanufacturing. Even minor deviations in critical process parameters (CPPs) such as media and feed compositions can significantly affect product critical quality attributes (CQAs). To identify CPPs and their interdependencies with product yield and CQAs, design of experiments, and multivariate statistical approaches are typically used in industry. Although these models can predict the effect of CPPs on product yield, there is room to improve CQA prediction performance by capturing the complex relationships in high-dimensional data. In this regard, machine learning (ML) approaches offer immense potential in handling non-linear datasets and thus are able to identify new CPPs that could effectively predict the CQAs. ML techniques can also be synergized with mechanistic models as a ‘hybrid ML’ or ‘white box ML’ to identify how CPPs affect the product yield and quality mechanistically, thus enabling rational design and control of the bioprocess. In this review, we describe the role of statistical modeling in Quality by Design (QbD) for biomanufacturing, and provide a generic outline on how relevant ML can be used to meaningfully analyze bioprocessing datasets. We then offer our perspectives on how relevant use of ML can accelerate the implementation of systematic QbD within the biopharma 4.0 paradigm.

[1]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[2]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[3]  Jason E. Stewart,et al.  Minimum information about a microarray experiment (MIAME)—toward standards for microarray data , 2001, Nature Genetics.

[4]  A. Illanes,et al.  Analysis of CHO Cells Metabolic Redistribution in a Glutamate‐Based Defined Medium in Continuous Culture , 2001, Biotechnology progress.

[5]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[6]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[7]  L. Serrano,et al.  Prediction of sequence-dependent and mutational effects on the aggregation of peptides and proteins , 2004, Nature Biotechnology.

[8]  F. Wurm Production of recombinant protein therapeutics in cultivated mammalian cells , 2004, Nature Biotechnology.

[9]  M. Moo-young,et al.  Statistical methods in media optimization for batch and fed-batch animal cell culture , 2007, Bioprocess and biosystems engineering.

[10]  A. Rathore,et al.  Application of Multivariate Data Analysis for Identification and Successful Resolution of a Root Cause for a Bioprocessing Application , 2008, Biotechnology progress.

[11]  Niki S. C. Wong,et al.  Metabolomics profiling of extracellular metabolites in recombinant Chinese Hamster Ovary fed-batch culture. , 2009, Rapid communications in mass spectrometry : RCM.

[12]  Ana P. Teixeira,et al.  Advances in on-line monitoring and control of mammalian cell cultures: Supporting the PAT initiative. , 2009, Biotechnology advances.

[13]  F. J. Krambeck,et al.  A mathematical model to derive N-glycan structures and cellular enzyme activities from mass spectrometric data. , 2009, Glycobiology.

[14]  A. Rathore,et al.  Quality by design for biopharmaceuticals , 2009, Nature Biotechnology.

[15]  I. Karimi,et al.  Combined data preprocessing and multivariate statistical analysis characterizes fed-batch culture of mouse hybridoma cells for rational medium design. , 2010, Journal of biotechnology.

[16]  T. Igawa,et al.  Reduced elimination of IgG antibodies by engineering the variable region. , 2010, Protein engineering, design & selection : PEDS.

[17]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[18]  P. Bondarenko,et al.  High-mannose glycans on the Fc region of therapeutic IgG antibodies increase serum clearance in humans. , 2011, Glycobiology.

[19]  Carl-Fredrik Mandenius,et al.  Process analytical technology (PAT) for biopharmaceuticals , 2011, Biotechnology journal.

[20]  Nicholas E. Timmins,et al.  Metabolite profiling of CHO cells with different growth characteristics , 2012, Biotechnology and bioengineering.

[21]  G. Karypis,et al.  Multivariate analysis of cell culture bioprocess data--lactate consumption as process indicator. , 2012, Journal of biotechnology.

[22]  Alex Eon-Duval,et al.  Application of Quality by Design to the characterization of the cell culture process of an Fc-Fusion protein. , 2012, European journal of pharmaceutics and biopharmaceutics : official journal of Arbeitsgemeinschaft fur Pharmazeutische Verfahrenstechnik e.V.

[23]  J. Menezes,et al.  Near‐infrared and two‐dimensional fluorescence spectroscopy monitoring of monoclonal antibody fermentation media quality: Aged media decreases cell growth , 2013, Biotechnology journal.

[24]  Michael J. Betenbaugh,et al.  Integration of the Transcriptome and Glycome for Identification of Glycan Cell Signatures , 2013, PLoS Comput. Biol..

[25]  H. Broly,et al.  A high-throughput media design approach for high performance mammalian fed-batch cultures , 2013, mAbs.

[26]  R. Kelley,et al.  Framework selection can influence pharmacokinetics of a humanized therapeutic antibody through differences in molecule charge , 2014, mAbs.

[27]  Thomas K. Villiger,et al.  Evaluating the impact of cell culture process parameters on monoclonal antibody N-glycosylation. , 2014, Journal of biotechnology.

[28]  Silvio C. E. Tosatto,et al.  PASTA 2.0: an improved server for protein aggregation prediction , 2014, Nucleic Acids Res..

[29]  M. Butler,et al.  Effects of nutrient levels and average culture pH on the glycosylation pattern of camelid-humanized monoclonal antibody. , 2014, Journal of biotechnology.

[30]  Rui Oliveira,et al.  Hybrid modeling for quality by design and PAT-benefits and challenges of applications in biopharmaceutical industry. , 2014, Biotechnology journal.

[31]  Bas Diepenbroek,et al.  Multivariate PAT solutions for biopharmaceutical cultivation: current progress and limitations. , 2014, Trends in biotechnology.

[32]  H. Budman,et al.  Intrinsic fluorescence‐based at situ soft sensor for monitoring monoclonal antibody aggregation , 2015, Biotechnology progress (Print).

[33]  A. Rathore,et al.  Mechanistic modeling of ion-exchange process chromatography of charge variants of monoclonal antibody products. , 2015, Journal of chromatography. A.

[34]  L. Quek,et al.  Dynamic metabolic flux analysis using B-splines to study the effects of temperature shift on CHO cell metabolism , 2015, Metabolic engineering communications.

[35]  J. Glassey,et al.  Multivariate analysis of the effect of operating conditions on hybridoma cell metabolism and glycosylation of produced antibody , 2015 .

[36]  C. Herwig,et al.  Combining mechanistic and data‐driven approaches to gain process knowledge on the control of the metabolic shift to lactate uptake in a fed‐batch CHO process , 2015, Biotechnology progress.

[37]  M. Morbidelli,et al.  Fingerprint detection and process prediction by multivariate analysis of fed‐batch monoclonal antibody cell culture data , 2015, Biotechnology progress.

[38]  R. Huber,et al.  Progress toward forecasting product quality and quantity of mammalian cell culture processes by performance‐based modeling , 2015, Biotechnology progress.

[39]  Anurag S Rathore,et al.  Fermentanomics: Relating quality attributes of a monoclonal antibody to cell culture process variables and raw materials using multivariate data analysis , 2015, Biotechnology progress.

[40]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[41]  Ayon Dey,et al.  Machine Learning Algorithms: A Review , 2022, International Journal of Science and Research (IJSR).

[42]  M. Sadowski,et al.  Harnessing QbD, Programming Languages, and Automation for Reproducible Biology. , 2016, Trends in biotechnology.

[43]  C. Herwig,et al.  Investigation of the interactions of critical scale-up parameters (pH, pO2 and pCO2) on CHO batch performance and critical quality attributes , 2016, Bioprocess and Biosystems Engineering.

[44]  Daniel C. Zielinski,et al.  A Consensus Genome-scale Reconstruction of Chinese Hamster Ovary Cell Metabolism. , 2016, Cell systems.

[45]  N. Lewis,et al.  A Markov chain model for N-linked protein glycosylation--towards a low-parameter tool for model-driven glycoengineering. , 2016, Metabolic engineering.

[46]  Shiwei Miao,et al.  Elucidating the effects of pH shift on IgG1 monoclonal antibody acidic charge variant levels in Chinese hamster ovary cell cultures , 2016, Applied Microbiology and Biotechnology.

[47]  David E. Ruckerbauer,et al.  What can mathematical modelling say about CHO metabolism and protein glycosylation? , 2017, Computational and structural biotechnology journal.

[48]  Thomas K. Villiger,et al.  Glycosylation flux analysis reveals dynamic changes of intracellular glycosylation flux distribution in Chinese hamster ovary fed-batch cultures. , 2017, Metabolic engineering.

[49]  Lan Zhang,et al.  Application of quality by design in the current drug development , 2016, Asian journal of pharmaceutical sciences.

[50]  M. Morbidelli,et al.  Enhanced process understanding and multivariate prediction of the relationship between cell culture process and monoclonal antibody quality , 2017, Biotechnology progress.

[51]  Massimo Morbidelli,et al.  Robust factor selection in early cell culture process development for the production of a biosimilar monoclonal antibody , 2017, Biotechnology progress.

[52]  Kok Siong Ang,et al.  Kinetic Modeling of Mammalian Cell Culture Bioprocessing: The Quest to Advance Biomanufacturing , 2018, Biotechnology journal.

[53]  Kah Fai Chan,et al.  The “less-is-more” in therapeutic antibodies: Afucosylated anti-cancer antibodies with enhanced antibody-dependent cellular cytotoxicity , 2018, mAbs.

[54]  Massimo Morbidelli,et al.  Sequential Multivariate Cell Culture Modeling at Multiple Scales Supports Systematic Shaping of a Monoclonal Antibody Toward a Quality Target. , 2018, Biotechnology journal.

[55]  Cleo Kontoravdi,et al.  Computational tools for predicting and controlling the glycosylation of biopharmaceuticals , 2018, Current Opinion in Chemical Engineering.

[56]  Sen Xu,et al.  pH excursions impact CHO cell culture performance and antibody N-linked glycosylation , 2018, Bioprocess and Biosystems Engineering.

[57]  Tony Wang,et al.  Product Attribute Forecast: Adaptive Model Selection Using Real-Time Machine Learning , 2018 .

[58]  F. Hesse,et al.  Identification of process conditions influencing protein aggregation in Chinese hamster ovary cell culture , 2018, Biotechnology and bioengineering.

[59]  Liang Zhao,et al.  A novel method based on nonparametric regression with a Gaussian kernel algorithm identifies the critical components in CHO media and feed optimization , 2019, Journal of Industrial Microbiology & Biotechnology.

[60]  Sai Rashmika Velugula-Yellela,et al.  Multivariate data analysis of growth medium trends affecting antibody glycosylation , 2019, Biotechnology progress.

[61]  Massimo Morbidelli,et al.  Bioprocessing in the Digital Age: The Role of Process Models , 2019, Biotechnology journal.

[62]  Massimo Morbidelli,et al.  A new generation of predictive models: The added value of hybrid models for manufacturing processes of therapeutic proteins , 2019, Biotechnology and bioengineering.

[63]  V. Sautou,et al.  Physicochemical stability of monoclonal antibodies: a review. , 2020, Journal of pharmaceutical sciences.

[64]  Sai Rashmika Velugula-Yellela,et al.  Real‐time quantification and supplementation of bioreactor amino acids to prolong culture time and maintain antibody product quality , 2019, Biotechnology progress.

[65]  Marcella Yu,et al.  Perfusion Cell Culture Decreases Process and Product Heterogeneity in a Head‐to‐Head Comparison With Fed‐Batch , 2018, Biotechnology journal.

[66]  Philip M. Jedrzejewski,et al.  Model‐based optimization of antibody galactosylation in CHO cell culture , 2019, Biotechnology and bioengineering.

[67]  Jaime Santos,et al.  Computational prediction of protein aggregation: Advances in proteomics, conformation-specific algorithms and biotechnological applications , 2020, Computational and structural biotechnology journal.

[68]  C. Kontoravdi,et al.  Harnessing the potential of artificial neural networks for predicting protein glycosylation , 2020, Metabolic engineering communications.

[69]  Hock Chuan Yeo,et al.  Enzyme capacity-based genome scale modelling of CHO cells. , 2020, Metabolic engineering.

[70]  Selen Bozkurt,et al.  MINIMAR (MINimum Information for Medical AI Reporting): Developing reporting standards for artificial intelligence in health care , 2020, J. Am. Medical Informatics Assoc..

[71]  Lisa Urquhart Top companies and drugs by sales in 2019 , 2020, Nature Reviews Drug Discovery.

[72]  Fabricio A. Chiappini,et al.  Modelling of bioprocess non-linear fluorescence data for at-line prediction of etanercept based on artificial neural networks optimized by response surface methodology. , 2020, Talanta.

[73]  Massimo Morbidelli,et al.  Cell culture process metabolomics together with multivariate data analysis tools opens new routes for bioprocess development and glycosylation prediction , 2020, Biotechnology progress.

[74]  W. Friess,et al.  Identification of Monoclonal Antibody Variants Involved in Aggregate Formation - Part 1: Charge Variants. , 2020, European journal of pharmaceutics and biopharmaceutics : official journal of Arbeitsgemeinschaft fur Pharmazeutische Verfahrenstechnik e.V.

[75]  K. Kavukcuoglu,et al.  Highly accurate protein structure prediction for the human proteome , 2021, Nature.

[76]  Jose J. Rico-Jimenez,et al.  Longitudinal monitoring of cell metabolism in biopharmaceutical production using label‐free fluorescence lifetime imaging microscopy , 2021, Biotechnology journal.

[77]  Kok Siong Ang,et al.  Multi‐omics profiling of a CHO cell culture system unravels the effect of culture pH on cell growth, antibody titer, and product quality , 2021, Biotechnology and bioengineering.

[78]  David E. Ruckerbauer,et al.  Towards rational glyco-engineering in CHO: from data to predictive models. , 2021, Current opinion in biotechnology.

[79]  Silvio C.E. Tosatto,et al.  DOME: recommendations for supervised machine learning validation in biology , 2020, Nature Methods.

[80]  Ioscani Jiménez del Val,et al.  Moving towards an era of hybrid modelling: advantages and challenges of coupling mechanistic and data-driven models for upstream pharmaceutical bioprocesses , 2021 .

[81]  K. Foley,et al.  Design of experiment (DOE) applied to artificial neural network architecture enables rapid bioprocess improvement , 2021, Bioprocess and Biosystems Engineering.

[82]  A. Butté,et al.  Hybrid modeling — a key enabler towards realizing digital twins in biopharma? , 2021 .

[83]  Anurag S. Rathore,et al.  Reinforcement learning based optimization of process chromatography for continuous processing of biopharmaceuticals , 2021, Chemical Engineering Science.