A computational pipeline for the development of multi-marker bio-signature panels and ensemble classifiers

BackgroundBiomarker panels derived separately from genomic and proteomic data and with a variety of computational methods have demonstrated promising classification performance in various diseases. An open question is how to create effective proteo-genomic panels. The framework of ensemble classifiers has been applied successfully in various analytical domains to combine classifiers so that the performance of the ensemble exceeds the performance of individual classifiers. Using blood-based diagnosis of acute renal allograft rejection as a case study, we address the following question in this paper: Can acute rejection classification performance be improved by combining individual genomic and proteomic classifiers in an ensemble?ResultsThe first part of the paper presents a computational biomarker development pipeline for genomic and proteomic data. The pipeline begins with data acquisition (e.g., from bio-samples to microarray data), quality control, statistical analysis and mining of the data, and finally various forms of validation. The pipeline ensures that the various classifiers to be combined later in an ensemble are diverse and adequate for clinical use. Five mRNA genomic and five proteomic classifiers were developed independently using single time-point blood samples from 11 acute-rejection and 22 non-rejection renal transplant patients. The second part of the paper examines five ensembles ranging in size from two to 10 individual classifiers. Performance of ensembles is characterized by area under the curve (AUC), sensitivity, and specificity, as derived from the probability of acute rejection for individual classifiers in the ensemble in combination with one of two aggregation methods: (1) Average Probability or (2) Vote Threshold. One ensemble demonstrated superior performance and was able to improve sensitivity and AUC beyond the best values observed for any of the individual classifiers in the ensemble, while staying within the range of observed specificity. The Vote Threshold aggregation method achieved improved sensitivity for all 5 ensembles, but typically at the cost of decreased specificity.ConclusionProteo-genomic biomarker ensemble classifiers show promise in the diagnosis of acute renal allograft rejection and can improve classification performance beyond that of individual genomic or proteomic classifiers alone. Validation of our results in an international multicenter study is currently underway.

[1]  B. McManus,et al.  Functional Genomic Analysis of Peripheral Blood During Early Acute Renal Allograft Rejection , 2009, Transplantation.

[2]  Tariq Habib Afridi,et al.  Mito-GSAAC: mitochondria prediction using genetic ensemble classifier and split amino acid composition , 2012, Amino Acids.

[3]  J. D. de Fijter,et al.  Rejection and function and chronic allograft dysfunction. , 2010, Kidney international. Supplement.

[4]  Philippe Besse,et al.  Statistical Applications in Genetics and Molecular Biology A Sparse PLS for Variable Selection when Integrating Omics Data , 2011 .

[5]  Constantin F. Aliferis,et al.  GEMS: A system for automated cancer diagnosis and biomarker discovery from microarray gene expression data , 2005, Int. J. Medical Informatics.

[6]  Gordon K Smyth,et al.  Statistical Applications in Genetics and Molecular Biology Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2011 .

[7]  Audrey Kauffmann,et al.  Bioinformatics Applications Note Arrayqualitymetrics—a Bioconductor Package for Quality Assessment of Microarray Data , 2022 .

[8]  Glenda C Gobe,et al.  Biomarkers in chronic kidney disease: a review. , 2011, Kidney international.

[9]  Raymond T. Ng,et al.  A Model-Based Ensembling Approach for Developing QSARs , 2009, J. Chem. Inf. Model..

[10]  R. Vanholder,et al.  ACUTE KIDNEY INJURY IN SOLID ORGAN TRANSPLANT RECIPIENTS , 2007, Acta clinica Belgica.

[11]  Gunnar Rätsch,et al.  Support Vector Machines and Kernels for Computational Biology , 2008, PLoS Comput. Biol..

[12]  Meenakshi Verma,et al.  Advances of genomic science and systems biology in renal transplantation: a review , 2011, Seminars in Immunopathology.

[13]  Klaus Obermayer,et al.  A new summarization method for affymetrix probe level data , 2006, Bioinform..

[14]  Ruben H. Zamar,et al.  MDQC: a new quality assessment method for microarrays based on quality control reports , 2007, Bioinform..

[15]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[16]  Rafael A. Irizarry,et al.  A Model-Based Background Adjustment for Oligonucleotide Expression Arrays , 2004 .

[17]  Timothy J. Triche,et al.  Whole blood genomic biomarkers of acute cardiac allograft rejection. , 2009, The Journal of heart and lung transplantation : the official publication of the International Society for Heart Transplantation.

[18]  Klaus Nordhausen,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition by Trevor Hastie, Robert Tibshirani, Jerome Friedman , 2009 .

[19]  B. McManus,et al.  Effects of Sample Timing and Treatment on Gene Expression in Early Acute Renal Allograft Rejection , 2011, Transplantation.

[20]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[21]  Jing Zhao,et al.  Biomarkers for the diagnosis, prognosis, and evaluation of treatment efficacy for traumatic brain injury , 2011, Neurotherapeutics.

[22]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[23]  T. Speed,et al.  Summaries of Affymetrix GeneChip probe level data. , 2003, Nucleic acids research.

[24]  Daniel Bernstein,et al.  Gene expression profiling distinguishes a molecular signature for grade 1B mild acute cellular rejection in cardiac allograft recipients. , 2007, The Journal of heart and lung transplantation : the official publication of the International Society for Heart Transplantation.

[25]  S. Horvath,et al.  Kidney Transplant Rejection and Tissue Injury by Gene Profiling of Biopsies and Peripheral Blood Lymphocytes , 2004, American journal of transplantation : official journal of the American Society of Transplantation and the American Society of Transplant Surgeons.

[26]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[27]  Sunho Lee,et al.  Mistakes in validating the accuracy of a prediction classifier in high-dimensional but small-sample microarray data , 2008, Statistical methods in medical research.

[28]  Byoung-Tak Zhang,et al.  Ensemble Learning with Active Example Selection for Imbalanced Biomedical Data Classification , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[29]  Christoph H. Borchers,et al.  Proteomic Signatures in Plasma during Early Acute Renal Allograft Rejection* , 2010, Molecular & Cellular Proteomics.

[30]  Ashutosh Kumar Singh,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2010 .

[31]  I. Yang,et al.  Multi-platform, multi-site, microarray-based human tumor classification. , 2004, The American journal of pathology.

[32]  R. Polikar,et al.  Ensemble based systems in decision making , 2006, IEEE Circuits and Systems Magazine.

[33]  Raymond T. Ng,et al.  Computational Biomarker Pipeline from Discovery to Clinical Implementation: Plasma Proteomic Biomarkers for Cardiac Transplantation , 2013, PLoS Comput. Biol..

[34]  R. Tibshirani,et al.  Diagnosis of multiple cancer types by shrunken centroids of gene expression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[35]  H. Horvitz,et al.  MicroRNA expression profiles classify human cancers , 2005, Nature.

[36]  R. Gentleman,et al.  Independent filtering increases detection power for high-throughput experiments , 2010, Proceedings of the National Academy of Sciences.

[37]  Robert A. Legenstein,et al.  Combining predictions for accurate recommender systems , 2010, KDD.

[38]  Chris Harbron,et al.  RefPlus: an R package extending the RMA Algorithm , 2007, Bioinform..

[39]  S. Dudoit,et al.  Multiple Hypothesis Testing in Microarray Experiments , 2003 .

[40]  J. Yates,et al.  Biomarkers for Early and Late Stage Chronic Allograft Nephropathy by Proteogenomic Profiling of Peripheral Blood , 2009, PloS one.

[41]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[42]  H. E. Hansen,et al.  The Banff 97 working classification of renal allograft pathology. , 1999, Kidney international.

[43]  Bor-Wen Cheng,et al.  Diagnosing Breast Masses in Digital Mammography Using Feature Selection and Ensemble Methods , 2012, Journal of Medical Systems.

[44]  T. Poggio,et al.  Multiclass cancer diagnosis using tumor gene expression signatures , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[45]  Lior Rokach,et al.  Ensemble-based classifiers , 2010, Artificial Intelligence Review.

[46]  Ludmila I. Kuncheva,et al.  Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy , 2003, Machine Learning.

[47]  R. Vasan,et al.  Biomarkers of Cardiovascular Disease: Molecular Basis and Practical Considerations , 2006, Circulation.

[48]  Zhengdong Cai,et al.  An integrative multi-platform analysis for discovering biomarkers of osteosarcoma , 2009, BMC Cancer.

[49]  Atul J. Butte,et al.  Ten Years of Pathway Analysis: Current Approaches and Outstanding Challenges , 2012, PLoS Comput. Biol..