Multi-TGDR, a multi-class regularization method, identifies the metabolic profiles of hepatocellular carcinoma and cirrhosis infected with hepatitis B or hepatitis C virus

BackgroundOver the last decade, metabolomics has evolved into a mainstream enterprise utilized by many laboratories globally. Like other “omics” data, metabolomics data has the characteristics of a smaller sample size compared to the number of features evaluated. Thus the selection of an optimal subset of features with a supervised classifier is imperative. We extended an existing feature selection algorithm, threshold gradient descent regularization (TGDR), to handle multi-class classification of “omics” data, and proposed two such extensions referred to as multi-TGDR. Both multi-TGDR frameworks were used to analyze a metabolomics dataset that compares the metabolic profiles of hepatocellular carcinoma (HCC) infected with hepatitis B (HBV) or C virus (HCV) with that of cirrhosis induced by HBV/HCV infection; the goal was to improve early-stage diagnosis of HCC.ResultsWe applied two multi-TGDR frameworks to the HCC metabolomics data that determined TGDR thresholds either globally across classes, or locally for each class. Multi-TGDR global model selected 45 metabolites with a 0% misclassification rate (the error rate on the training data) and had a 3.82% 5-fold cross-validation (CV-5) predictive error rate. Multi-TGDR local selected 48 metabolites with a 0% misclassification rate and a 5.34% CV-5 error rate.ConclusionsOne important advantage of multi-TGDR local is that it allows inference for determining which feature is related specifically to the class/classes. Thus, we recommend multi-TGDR local be used because it has similar predictive performance and requires the same computing time as multi-TGDR global, but may provide class-specific inference.

[1]  Minjun Chen,et al.  Mass spectrometry-based metabolic profiling of rat urine associated with general toxicity induced by the multiglycoside of Tripterygium wilfordii Hook. f. , 2008, Chemical research in toxicology.

[2]  Xin Lu,et al.  Serum metabolic profiling study of hepatocellular carcinoma infected with hepatitis B or hepatitis C virus by using liquid chromatography-mass spectrometry. , 2012, Journal of proteome research.

[3]  J. van der Greef,et al.  The role of analytical sciences in medical systems biology. , 2004, Current opinion in chemical biology.

[4]  S. Robbins,et al.  Pathologic basis of disease , 1974 .

[5]  Adrian E. Raftery,et al.  Bayesian model averaging: development of an improved multi-class, gene selection and classification tool for microarray data , 2005, Bioinform..

[6]  Gordon K. Smyth,et al.  limma: Linear Models for Microarray Data , 2005 .

[7]  J. Lindon,et al.  'Metabonomics': understanding the metabolic responses of living systems to pathophysiological stimuli via multivariate statistical analysis of biological NMR spectroscopic data. , 1999, Xenobiotica; the fate of foreign compounds in biological systems.

[8]  William Stafford Noble,et al.  Computational and Statistical Analysis of Protein Mass Spectrometry Data , 2012, PLoS Comput. Biol..

[9]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[10]  Zhi-Hua Zhou,et al.  ML-KNN: A lazy learning approach to multi-label learning , 2007, Pattern Recognit..

[11]  A. Shimizu,et al.  Experimental studies on hepatic cirrhosis and hepatocarcinogenesis. I. Production of hepatic cirrhosis by furfural administration. , 1986, Acta pathologica japonica.

[12]  Seymour Geisser,et al.  8. Predictive Inference: An Introduction , 1995 .

[13]  Jian Huang,et al.  Regularized ROC method for disease classification and biomarker selection with microarray data , 2005, Bioinform..

[14]  Krzysztof Fujarewicz,et al.  Stable feature selection and classification algorithms for multiclass microarray data , 2012, Biology Direct.

[15]  A. Abbas,et al.  Comprar Robbins & Cotran Pathologic Basis of Disease 8Ed | Nelson Fausto | 9781416031215 | Saunders , 2009 .

[16]  S. Tian,et al.  Hierarchical-TGDR , 2013 .

[17]  Richard N. Mitchell,et al.  Comprar Pocket Companion To Robbins & Cotran Pathologic Basis Of Disease, International Edition | R. Mitchell | 9780808924470 | Saunders , 2011 .

[18]  A. Seidell CHEMIICAL SOCIETY OF WASHINGTON , .

[19]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[20]  Z. Ramadan,et al.  Metabolic profiling using principal component analysis, discriminant partial least squares, and genetic algorithms. , 2006, Talanta.

[21]  J. Luk,et al.  Enhanced Detection of Early Hepatocellular Carcinoma by Serum SELDI-TOF Proteomic Signature Combined with Alpha-Fetoprotein Marker , 2010, Annals of Surgical Oncology.

[22]  M. Lawera Predictive inference : an introduction , 1995 .

[23]  B. Daviss Growing pains for metabolomics , 2005 .

[24]  J. Davis Bioinformatics and Computational Biology Solutions Using R and Bioconductor , 2007 .

[25]  Christian Baumgartner,et al.  Bioinformatic-driven search for metabolic biomarkers in disease , 2011, Journal of Clinical Bioinformatics.

[26]  Jon C. Aster,et al.  Robbins & Cotran Pathologic Basis of Disease , 2014 .

[27]  Gregory A. Bohach,et al.  5 – Pathogenesis of disease , 2004 .

[28]  R. Metzger,et al.  Cytotoxicity of xanthopterin and isoxanthopterin in MCF-7 cells. , 2005, Cancer letters.

[29]  Suyan Tian,et al.  Multi-TGDR: A Regularization Method for Multi-Class Classification in Microarray Experiments , 2013, PloS one.

[30]  Suyan Tian,et al.  Meta-Analysis Derived (MAD) Transcriptome of Psoriasis Defines the “Core” Pathogenesis of Disease , 2012, PloS one.

[31]  David S. Wishart,et al.  HMDB: a knowledgebase for the human metabolome , 2008, Nucleic Acids Res..

[32]  G. Casazza,et al.  Accuracy of Ultrasonography, Spiral CT, Magnetic Resonance, and Alpha-Fetoprotein in Diagnosing Hepatocellular Carcinoma: A Systematic Review , 2006, The American Journal of Gastroenterology.

[33]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[34]  Tao Li,et al.  A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression , 2004, Bioinform..

[35]  Rafael A. Irizarry,et al.  Bioinformatics and Computational Biology Solutions using R and Bioconductor , 2005 .

[36]  Xiaohua Wu,et al.  Serum 27-nor-5β-cholestane-3,7,12,24,25 pentol glucuronide discovered by metabolomics as potential diagnostic biomarker for epithelium ovarian cancer. , 2011, Journal of proteome research.

[37]  R. Geyer,et al.  An improved HPLC analysis of the metabolite furoic acid in the urine of workers occupationally exposed to furfural. , 2003, Journal of analytical toxicology.

[38]  N. Niles Pathologic Basis of Disease , 1974 .

[39]  Saso Dzeroski,et al.  Decision trees for hierarchical multi-label classification , 2008, Machine Learning.

[40]  A. Shimizu EXPERIMENTAL STUDIES ON HEPATIC CIRRHOSIS AND HEPATOCARCINOGENESIS , 1986 .

[41]  Bogdan E. Popescu,et al.  Gradient Directed Regularization for Linear Regression and Classi…cation , 2004 .