Random sample consensus combined with partial least squares regression (RANSAC-PLS) for microbial metabolomics data mining and phenotype improvement.

In recent years, the advent of high-throughput omics technology has made possible a new class of strain engineering approaches, based on identification of possible gene targets for phenotype improvement from omic-level comparison of different strains or growth conditions. Metabolomics, with its focus on the omic level closest to the phenotype, lends itself naturally to this semi-rational methodology. When a quantitative phenotype such as growth rate under stress is considered, regression modeling using multivariate techniques such as partial least squares (PLS) is often used to identify metabolites correlated with the target phenotype. However, linear modeling techniques such as PLS require a consistent metabolite-phenotype trend across the samples, which may not be the case when outliers or multiple conflicting trends are present in the data. To address this, we proposed a data-mining strategy that utilizes random sample consensus (RANSAC) to select subsets of samples with consistent trends for construction of better regression models. By applying a combination of RANSAC and PLS (RANSAC-PLS) to a dataset from a previous study (gas chromatography/mass spectrometry metabolomics data and 1-butanol tolerance of 19 yeast mutant strains), new metabolites were indicated to be correlated with tolerance within certain subsets of the samples. The relevance of these metabolites to 1-butanol tolerance were then validated from single-deletion strains of corresponding metabolic genes. The results showed that RANSAC-PLS is a promising strategy to identify unique metabolites that provide additional hints for phenotype improvement, which could not be detected by traditional PLS modeling using the entire dataset.

[1]  Ronald W. Davis,et al.  Functional profiling of the Saccharomyces cerevisiae genome , 2002, Nature.

[2]  Ron Wehrens,et al.  The pls Package: Principal Component and Partial Least Squares Regression in R , 2007 .

[3]  Jae Sung Cho,et al.  Global physiological understanding and metabolic engineering of microorganisms based on omics studies , 2005, Applied Microbiology and Biotechnology.

[4]  Ralf Tautenhahn,et al.  Meta-analysis of untargeted metabolomic data from multiple profiling experiments , 2012, Nature Protocols.

[5]  E. Fukusaki,et al.  Plant metabolomics: potential for practical operation. , 2005, Journal of bioscience and bioengineering.

[6]  Hiroshi Shimizu,et al.  Design of Superior Cell Factories Based on Systems Wide Omics Analysis , 2012 .

[7]  James González,et al.  Saccharomyces cerevisiae Bat1 and Bat2 Aminotransferases Have Functionally Diverged from the Ancestral-Like Kluyveromyces lactis Orthologous Enzyme , 2011, PloS one.

[8]  S. Wold,et al.  Orthogonal projections to latent structures (O‐PLS) , 2002 .

[9]  Anne-Laure Boulesteix,et al.  Partial least squares: a versatile tool for the analysis of high-dimensional genomic data , 2006, Briefings Bioinform..

[10]  D. Scheel,et al.  The Multifunctional Enzyme CYP71B15 (PHYTOALEXIN DEFICIENT3) Converts Cysteine-Indole-3-Acetonitrile to Camalexin in the Indole-3-Acetonitrile Metabolic Network of Arabidopsis thaliana[W][OA] , 2009, The Plant Cell Online.

[11]  John Draper,et al.  Representation, comparison, and interpretation of metabolome fingerprint data for total composition analysis and quality trait investigation in potato cultivars. , 2007, Journal of agricultural and food chemistry.

[12]  D. Walters,et al.  Formation of cadaverine derivatives in Saccharomyces cerevisiae. , 1996, FEMS microbiology letters.

[13]  E. Fukusaki,et al.  Metabolomic approach for improving ethanol stress tolerance in Saccharomyces cerevisiae. , 2016, Journal of bioscience and bioengineering.

[14]  Sang Yup Lee,et al.  Systems biotechnology for strain improvement. , 2005, Trends in biotechnology.

[15]  D. Klionsky,et al.  Purification and biochemical characterization of the ATH1 gene product, vacuolar acid trehalase, from Saccharomyces cerevisiae , 1996, FEBS letters.

[16]  Yong-Su Jin,et al.  Identification of gene targets eliciting improved alcohol tolerance in Saccharomyces cerevisiae through inverse metabolic engineering. , 2010, Journal of biotechnology.

[17]  Lei Chen,et al.  Metabolomic basis of laboratory evolution of butanol tolerance in photosynthetic Synechocystis sp. PCC 6803 , 2014, Microbial Cell Factories.

[18]  O. Fiehn Metabolomics – the link between genotypes and phenotypes , 2004, Plant Molecular Biology.

[19]  M. Iordachescu,et al.  Trehalose biosynthesis in response to abiotic stresses. , 2008, Journal of integrative plant biology.

[20]  S. Wold,et al.  PLS-regression: a basic tool of chemometrics , 2001 .

[21]  Gregory Stephanopoulos,et al.  Combinatorial engineering of microbes for optimizing cellular phenotype. , 2008, Current opinion in chemical biology.

[22]  Luis H. Reyes,et al.  Genetic Determinants for n-Butanol Tolerance in Evolved Escherichia coli Mutants: Cross Adaptation and Antagonistic Pleiotropy between n-Butanol and Other Stressors , 2013, Applied and Environmental Microbiology.

[23]  Xiaoxu Tian,et al.  Global metabolomic and network analysis of Escherichia coli responses to exogenous biofuels. , 2013, Journal of proteome research.

[24]  Jordi Coello,et al.  Orthogonal signal correction in near infrared calibration , 2001 .

[25]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[26]  H. Shimizu,et al.  Metabolomic analysis of acid stress response in Saccharomyces cerevisiae. , 2015, Journal of bioscience and bioengineering.

[27]  H. Holzer,et al.  Molecular analysis of the neutral trehalase gene from Saccharomyces cerevisiae. , 1993, The Journal of biological chemistry.

[28]  Daniel Amador-Noguez,et al.  Metabolomic analysis via reversed-phase ion-pairing liquid chromatography coupled to a stand alone orbitrap mass spectrometer. , 2010, Analytical chemistry.

[29]  Eiichiro Fukusaki,et al.  Metabolomics‐based systematic prediction of yeast lifespan and its application for semi‐rational screening of ageing‐related mutants , 2010, Aging cell.

[30]  U. Edlund,et al.  Visualization of GC/TOF-MS-based metabolomics data for identification of biochemically interesting compounds using OPLS class models. , 2008, Analytical chemistry.

[31]  Eiichiro Fukusaki,et al.  A metabolomics-based strategy for identification of gene targets for phenotype improvement and its application to 1-butanol tolerance in Saccharomyces cerevisiae , 2015, Biotechnology for Biofuels.

[32]  Timothy M. D. Ebbels,et al.  The evolution of partial least squares models and related chemometric approaches in metabonomics and metabolic phenotyping , 2010 .

[33]  E. Fukusaki,et al.  Current metabolomics: practical applications. , 2013, Journal of bioscience and bioengineering.

[34]  Kaizhi Jia,et al.  Systematic engineering of microorganisms to improve alcohol tolerance , 2010 .

[35]  D. A. Court,et al.  Mitochondrial and Cytosolic Branched-chain Amino Acid Transaminases from Yeast, Homologs of the myc Oncogene-regulated Eca39 Protein* , 1996, The Journal of Biological Chemistry.

[36]  S. Bijlsma,et al.  Metabolomics as a tool for target identification in strain improvement: the influence of phenotype definition. , 2011, Microbiology.

[37]  D. Walters,et al.  Polyamine metabolism in Saccharomyces cerevisiae exposed to ethanol. , 1998, Microbiological research.