Early response index: a statistic to discover potential early stage disease biomarkers

BackgroundIdentifying disease correlated features early before large number of molecules are impacted by disease progression with significant abundance change is very advantageous to biologists for developing early disease diagnosis biomarkers. Disease correlated features have relatively low level of abundance change at early stages. Finding them using existing bioinformatic tools in high throughput data is a challenging task since the technology suffers from limited dynamic range and significant noise. Most existing biomarker discovery algorithms can only detect molecules with high abundance changes, frequently missing early disease diagnostic markers.ResultsWe present a new statistic called early response index (ERI) to prioritize disease correlated molecules as potential early biomarkers. Instead of classification accuracy, ERI measures the average classification accuracy improvement attainable by a feature when it is united with other counterparts for classification. ERI is more sensitive to abundance changes than other ranking statistics. We have shown that ERI significantly outperforms SAM and Localfdr in detecting early responding molecules in a proteomics study of a mouse model of multiple sclerosis. Importantly, ERI was able to detect many disease relevant proteins before those algorithms detect them at a later time point.ConclusionsERI method is more sensitive for significant feature detection during early stage of disease development. It potentially has a higher specificity for biomarker discovery, and can be used to identify critical time frame for disease intervention.

[1]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[2]  D. Hochstrasser,et al.  Relative quantification of proteins in human cerebrospinal fluids by MS/MS using 6-plex isobaric tags. , 2008, Analytical chemistry.

[3]  Sirajul Salekin,et al.  A Robust and Efficient Feature Selection Algorithm for Microarray Data , 2017, Molecular informatics.

[4]  T. Ørntoft,et al.  Metastasis-Associated Gene Expression Changes Predict Poor Outcomes in Patients with Dukes Stage B and C Colorectal Cancer , 2009, Clinical Cancer Research.

[5]  Rita M A M Moura Franco,et al.  Analysis of differentially expressed genes in colorectal adenocarcinoma with versus without metastasis by three-dimensional oligonucleotide microarray. , 2014, International journal of clinical and experimental pathology.

[6]  Paul Kearney,et al.  A Blood-Based Proteomic Classifier for the Molecular Characterization of Pulmonary Nodules , 2013, Science Translational Medicine.

[7]  J. Heckman Sample selection bias as a specification error , 1979 .

[8]  Ming Zhang,et al.  Comparing sequences without using alignments: application to HIV/SIV subtyping , 2007, BMC Bioinformatics.

[9]  Brad T. Sherman,et al.  Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources , 2008, Nature Protocols.

[10]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[11]  Nicolas Delhomme,et al.  Nos2 Inactivation Promotes the Development of Medulloblastoma in Ptch1+/− Mice by Deregulation of Gap43–Dependent Granule Cell Precursor Migration , 2012, PLoS genetics.

[12]  Itay Raphael,et al.  Immunoenrichment microwave and magnetic proteomics for quantifying CD47 in the experimental autoimmune encephalomyelitis model of multiple sclerosis , 2012, Electrophoresis.

[13]  Xuepo Ma,et al.  PeakLink: a new peptide peak linking method in LC-MS/MS using wavelet and SVM , 2014, Bioinform..

[14]  Jianqiu Zhang,et al.  MZDASoft: a software architecture that enables large-scale comparison of protein expression levels over multiple samples based on liquid chromatography/tandem mass spectrometry. , 2015, Rapid communications in mass spectrometry : RCM.

[15]  David E. Misek,et al.  Gene-expression profiles predict survival of patients with lung adenocarcinoma , 2002, Nature Medicine.

[16]  Nikola Bogunovic,et al.  A review of feature selection methods with applications , 2015, 2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO).

[17]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Itay Raphael,et al.  Microwave and magnetic (M2) proteomics of the experimental autoimmune encephalomyelitis animal model of multiple sclerosis , 2012, Electrophoresis.

[19]  Yin Liu,et al.  Comparison of methods for identifying differentially expressed genes across multiple conditions from microarray data , 2011, Bioinformation.

[20]  Chris H. Q. Ding,et al.  Minimum redundancy feature selection from microarray gene expression data , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[21]  Pablo A. Estévez,et al.  A review of feature selection methods based on mutual information , 2013, Neural Computing and Applications.

[22]  Daniel Q. Naiman,et al.  Simple decision rules for classifying human cancers from gene expression profiles , 2005, Bioinform..

[23]  Thomas J. Watson,et al.  An empirical study of the naive Bayes classifier , 2001 .

[24]  Boris P. Hejblum,et al.  Time-Course Gene Set Analysis for Longitudinal Gene Expression Data , 2015, PLoS Comput. Biol..

[25]  Mathukumalli Vidyasagar,et al.  Identifying predictive features in drug response using machine learning: opportunities and challenges. , 2015, Annual review of pharmacology and toxicology.

[26]  Xuefeng Bruce Ling,et al.  Multiclass cancer classification and biomarker discovery using GA-based algorithms , 2005, Bioinform..

[27]  T. Jaakkola,et al.  Comparing the continuous representation of time-series expression profiles to identify differentially expressed genes , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[28]  Francesco Chiappelli,et al.  Biomarkers for early detection of high risk cancers: From gliomas to nasopharyngeal carcinoma , 2009, Bioinformation.

[29]  Itay Raphael,et al.  Microwave & Magnetic (M2) Proteomics Reveals CNS-Specific Protein Expression Waves that Precede Clinical Symptoms of Experimental Autoimmune Encephalomyelitis , 2014, Scientific Reports.

[30]  J. Foekens,et al.  Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer , 2005, The Lancet.

[31]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[32]  Itay Raphael,et al.  Body fluid biomarkers in multiple sclerosis: how far we have come and how they could affect the clinic now and in the future , 2015, Expert review of clinical immunology.

[33]  Jianqiu Zhang,et al.  Early disease correlated protein detection using early response index (ERI) , 2016, 2016 IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI).

[34]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[35]  Paul S Mischel,et al.  Autocrine Endothelin-3/Endothelin Receptor B Signaling Maintains Cellular and Molecular Properties of Glioblastoma Stem Cells , 2011, Molecular Cancer Research.

[36]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[37]  Donald Geman,et al.  Large-scale integration of cancer microarray data identifies a robust common cancer signature , 2007, BMC Bioinformatics.

[38]  T. Poggio,et al.  Prediction of central nervous system embryonal tumour outcome based on gene expression , 2002, Nature.

[39]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .

[40]  John D. Storey,et al.  Significance analysis of time course microarray experiments. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[41]  Hiroshi Tanaka,et al.  Identification of NUCKS1 as a colorectal cancer prognostic marker through integrated expression and copy number analysis , 2013, International journal of cancer.

[42]  S. Horvath,et al.  Gene Expression Profiling of Gliomas Strongly Predicts Survival , 2004, Cancer Research.

[43]  Geoffrey J. McLachlan,et al.  A simple implementation of a normal mixture approach to differential gene expression in multiclass microarrays , 2006, Bioinform..

[44]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[45]  G. Li,et al.  An integrated approach utilizing artificial neural networks and SELDI mass spectrometry for the classification of human tumours and rapid identification of potential biomarkers , 2002, Bioinform..

[46]  P. Sebastiani,et al.  Airway epithelial gene expression in the diagnostic evaluation of smokers with suspect lung cancer , 2007, Nature Medicine.