Mining manufacturing data for discovery of high productivity process characteristics.

Modern manufacturing facilities for bioproducts are highly automated with advanced process monitoring and data archiving systems. The time dynamics of hundreds of process parameters and outcome variables over a large number of production runs are archived in the data warehouse. This vast amount of data is a vital resource to comprehend the complex characteristics of bioprocesses and enhance production robustness. Cell culture process data from 108 'trains' comprising production as well as inoculum bioreactors from Genentech's manufacturing facility were investigated. Each run constitutes over one-hundred on-line and off-line temporal parameters. A kernel-based approach combined with a maximum margin-based support vector regression algorithm was used to integrate all the process parameters and develop predictive models for a key cell culture performance parameter. The model was also used to identify and rank process parameters according to their relevance in predicting process outcome. Evaluation of cell culture stage-specific models indicates that production performance can be reliably predicted days prior to harvest. Strong associations between several temporal parameters at various manufacturing stages and final process outcome were uncovered. This model-based data mining represents an important step forward in establishing a process data-driven knowledge discovery in bioprocesses. Implementation of this methodology on the manufacturing floor can facilitate a real-time decision making process and thereby improve the robustness of large scale bioprocesses.

[1]  G. Montague,et al.  Enhanced supervision of recombinant E. coli fermentation via artificial neural networks , 1994 .

[2]  M. Gerstein,et al.  A Bayesian Networks Approach for Predicting Protein-Protein Interactions from Genomic Data , 2003, Science.

[3]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[4]  Hiroshi Shimizu,et al.  Classification of fermentation performance by multivariate analysis based on mean hypothesis testing. , 2002, Journal of bioscience and bioengineering.

[5]  Gunnar Rätsch,et al.  Support Vector Machines and Kernels for Computational Biology , 2008, PLoS Comput. Biol..

[6]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[7]  David E Block,et al.  An integrated approach to optimization of Escherichia coli fermentations using historical data , 2003, Biotechnology and bioengineering.

[8]  Bhavik R. Bakshi,et al.  Representation of process trends—IV. Induction of real-time patterns from operating data for diagnosis and supervisory control , 1994 .

[9]  Geoffrey J McLachlan,et al.  Selection bias in gene extraction on the basis of microarray gene-expression data , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[10]  A. Owen,et al.  A Bayesian framework for combining heterogeneous data sources for gene function prediction (in Saccharomyces cerevisiae) , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[11]  George Karypis,et al.  Mining bioprocess data: opportunities and challenges. , 2008, Trends in biotechnology.

[12]  S Vlassides,et al.  Using historical data for bioprocess optimization: modeling wine characteristics using artificial neural networks and archived process information. , 2001, Biotechnology and bioengineering.

[13]  T. Bachinger,et al.  Monitoring cellular state transitions in a production-scale CHO-cell process using an electronic nose. , 2000, Journal of biotechnology.

[14]  Jeremy S. Conner,et al.  Application of Multivariate Analysis toward Biotech Processes: Case Study of a Cell‐Culture Unit Operation , 2007, Biotechnology progress.

[15]  S Bicciato,et al.  Mining of biological data II: assessing data structure and class homogeneity by cluster analysis. , 2000, Metabolic engineering.

[16]  Matthew C Coleman,et al.  Retrospective optimization of time‐dependent fermentation control strategies using time‐independent historical data , 2006, Biotechnology and bioengineering.

[17]  Bernhard Schölkopf,et al.  New Support Vector Algorithms , 2000, Neural Computation.

[18]  J Glassey,et al.  Artificial neural network based experimental design procedure for enhancing fermentation development , 1994, Biotechnology and bioengineering.

[19]  Gunnar Rätsch,et al.  An introduction to kernel-based learning algorithms , 2001, IEEE Trans. Neural Networks.

[20]  Jonathan Goldstein,et al.  When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[21]  Carl-Fredrik Mandenius,et al.  Electronic nose for estimation of product concentration in mammalian cell cultivation , 2000 .

[22]  G Stephanopoulos,et al.  Fermentation database mining by pattern recognition. , 1997, Biotechnology and bioengineering.

[23]  Ust Beijing,et al.  Data Mining and Knowledge Discovery in Databases , 1999 .