An OMIC biomarker detection algorithm TriVote and its application in methylomic biomarker detection.

AIM Transcriptomic and methylomic patterns represent two major OMIC data sources impacted by both inheritable genetic information and environmental factors, and have been widely used as disease diagnosis and prognosis biomarkers. MATERIALS & METHODS Modern transcriptomic and methylomic profiling technologies detect the status of tens of thousands or even millions of probing residues in the human genome, and introduce a major computational challenge for the existing feature selection algorithms. This study proposes a three-step feature selection algorithm, TriVote, to detect a subset of transcriptomic or methylomic residues with highly accurate binary classification performance. RESULTS & CONCLUSION TriVote outperforms both filter and wrapper feature selection algorithms with both higher classification accuracy and smaller feature number on 17 transcriptomes and two methylomes. Biological functions of the methylome biomarkers detected by TriVote were discussed for their disease associations. An easy-to-use Python package is also released to facilitate the further applications.

[1]  Jijun Tang,et al.  PhosPred-RF: A Novel Sequence-Based Predictor for Phosphorylation Sites Using Sequential Information Only , 2017, IEEE Transactions on NanoBioscience.

[2]  N. Hu,et al.  Comparison of Global Gene Expression of Gastric Cardia and Noncardia Cancers from a High-Risk Population in China , 2013, PloS one.

[3]  Pedro Larrañaga,et al.  Filter versus wrapper gene selection approaches in DNA microarray domains , 2004, Artif. Intell. Medicine.

[4]  K. Coombs,et al.  Knockdown of specific host factors protects against influenza virus-induced cell death , 2013, Cell Death and Disease.

[5]  Dong Yu,et al.  Feature engineering in Context-Dependent Deep Neural Networks for conversational speech transcription , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[6]  Roger E Bumgarner,et al.  A prioritization analysis of disease association by data-mining of functional annotation of human genes. , 2012, Genomics.

[7]  Guoqing Wang,et al.  Gene expression profile based classification models of psoriasis. , 2014, Genomics.

[8]  Juanying Xie,et al.  Using support vector machines with a novel hybrid feature selection method for diagnosis of erythemato-squamous diseases , 2011, Expert Syst. Appl..

[9]  S. Payne From discovery to the clinic: the novel DNA methylation biomarker (m)SEPT9 for the detection of colorectal cancer in blood. , 2010, Epigenomics.

[10]  Marta López,et al.  Schizophrenia: A review of potential biomarkers. , 2017, Journal of psychiatric research.

[11]  F. Zhan,et al.  The role of the Wnt-signaling antagonist DKK1 in the development of osteolytic lesions in multiple myeloma. , 2003, The New England journal of medicine.

[12]  Shu-Lin Wang,et al.  Neighborhood Rough Set Reduction-Based Gene Selection and Prioritization for Gene Expression Profile Analysis and Molecular Cancer Classification , 2010, Journal of biomedicine & biotechnology.

[13]  J. D. Watson,et al.  Human Genome Project: Twenty-five years of big biology , 2015, Nature.

[14]  J. Husted,et al.  Early environmental exposures influence schizophrenia expression even in the presence of strong genetic predisposition , 2012, Schizophrenia Research.

[15]  G. Breen,et al.  Genetic and environmental risk factors for rheumatoid arthritis in a UK African ancestry population: the GENRA case–control study , 2017, Rheumatology.

[16]  Erik Schrumpf,et al.  Novel target genes and a valid biomarker panel identified for cholangiocarcinoma , 2012, Epigenetics.

[17]  R. Zewail,et al.  Vertebral segmentation using contourlet-based salient point matching and localized multiscale shape prior , 2009, Medical Imaging.

[18]  M. Qadir,et al.  Cdc42: Role in Cancer Management , 2015, Chemical biology & drug design.

[19]  U. Alon,et al.  Transcriptional gene expression profiles of colorectal adenoma, adenocarcinoma, and normal tissue examined by oligonucleotide arrays. , 2001, Cancer research.

[20]  Jeremy H. Herskowitz,et al.  ROCK1 and ROCK2 inhibition alters dendritic spine morphology in hippocampal neurons , 2015, Cellular logistics.

[21]  A. Jankowska,et al.  The potential of DNA modifications as biomarkers and therapeutic targets in oncology , 2015, Expert review of molecular diagnostics.

[22]  F. Pallardó,et al.  Epigenetic biomarkers in laboratory diagnostics: emerging approaches and opportunities , 2013, Expert review of molecular diagnostics.

[23]  Peter X K Song,et al.  Study design in high-dimensional classification analysis. , 2016, Biostatistics.

[24]  Brad T. Sherman,et al.  Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources , 2008, Nature Protocols.

[25]  Fengfeng Zhou,et al.  Multiple similarly effective solutions exist for biomedical feature selection and classification problems , 2017, Scientific Reports.

[26]  J. Choe,et al.  Activation of dickkopf-1 and focal adhesion kinase pathway by tumour necrosis factor α induces enhanced migration of fibroblast-like synoviocytes in rheumatoid arthritis. , 2016, Rheumatology.

[27]  Shuai Liu,et al.  RIFS: a randomly restarted incremental feature selection algorithm , 2017, Scientific Reports.

[28]  Edoardo Amaldi,et al.  On the Approximability of Minimizing Nonzero Variables or Unsatisfied Relations in Linear Systems , 1998, Theor. Comput. Sci..

[29]  Mohamed F. Ghalwash,et al.  Minimum redundancy maximum relevance feature selection approach for temporal gene expression data , 2017, BMC Bioinformatics.

[30]  H. Schiöth,et al.  A methylome-wide mQTL analysis reveals associations of methylation sites with GAD1 and HDAC3 SNPs and a general psychiatric risk score , 2017, Translational Psychiatry.

[31]  D. E. Knuth,et al.  Postscript about NP-hard problems , 1974, SIGA.

[32]  R. Gentleman,et al.  Gene expression profile of adult T-cell acute lymphocytic leukemia identifies distinct subsets of patients with different response to therapy and survival. , 2004, Blood.

[33]  J. Herman,et al.  A gene hypermethylation profile of human cancer. , 2001, Cancer research.

[34]  J. Xie,et al.  MicroRNA-27a Inhibits Cell Migration and Invasion of Fibroblast-Like Synoviocytes by Targeting Follistatin-Like Protein 1 in Rheumatoid Arthritis , 2016, Molecules and cells.

[35]  Paul Geladi,et al.  Principal Component Analysis , 1987, Comprehensive Chemometrics.

[36]  Todd,et al.  Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning , 2002, Nature Medicine.

[37]  Bauke Ylstra,et al.  Comprehensive genomic meta-analysis identifies intra-tumoural stroma as a predictor of survival in patients with gastric cancer , 2012, Gut.

[38]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[39]  Liujuan Cao,et al.  A novel features ranking metric with application to scalable visual and bioinformatics data classification , 2016, Neurocomputing.

[40]  T. Poggio,et al.  Prediction of central nervous system embryonal tumour outcome based on gene expression , 2002, Nature.

[41]  Guoqing Wang,et al.  McTwo: a two-step feature selection algorithm based on maximal information coefficient , 2016, BMC Bioinformatics.

[42]  Martin J. Aryee,et al.  Epigenome-wide association data implicate DNA methylation as an intermediary of genetic risk in Rheumatoid Arthritis , 2013, Nature Biotechnology.

[43]  H. Demirel,et al.  Feature-ranking-based Alzheimer's disease classification from structural MRI. , 2016, Magnetic resonance imaging.

[44]  Yadong Wang,et al.  A gradient-boosting approach for filtering de novo mutations in parent-offspring trios , 2014, Bioinform..

[45]  Zhijun Xie,et al.  Methylome-wide Association Study of Atrial Fibrillation in Framingham Heart Study , 2017, Scientific Reports.

[46]  George C. Runger,et al.  Feature selection via regularized trees , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[47]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[48]  Saeid Nahavandi,et al.  EEG data classification using wavelet features selected by Wilcoxon statistics , 2014, Neural Computing and Applications.

[49]  M. Gill,et al.  Chitinase-3-Like 1 (CHI3L1) Gene and Schizophrenia: Genetic Association and a Potential Functional Mechanism , 2008, Biological Psychiatry.

[50]  Jon D. Patrick,et al.  Research and applications: Supervised machine learning and active learning in classification of radiology reports , 2014, J. Am. Medical Informatics Assoc..

[51]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[52]  J. G. Liao,et al.  Logistic regression for disease classification using microarray data: model selection in a large p and small n case , 2007, Bioinform..

[53]  Gaotao Shi,et al.  CPPred-RF: A Sequence-based Predictor for Identifying Cell-Penetrating Peptides and Their Uptake Efficiency. , 2017, Journal of proteome research.

[54]  Wei-Min Liu,et al.  Analysis of high density expression microarrays with signed-rank call algorithms , 2002, Bioinform..

[55]  Brad T. Sherman,et al.  Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists , 2008, Nucleic acids research.

[56]  Seong Pil Chung,et al.  Identification of DNA-binding proteins that interact with the 5'-flanking region of the human D-amino acid oxidase gene by pull-down assay coupled with two-dimensional gel electrophoresis and mass spectrometry. , 2015, Journal of pharmaceutical and biomedical analysis.

[57]  Jinzhu Han,et al.  Gene methylation as a powerful biomarker for detection and screening of non-small cell lung cancer in blood , 2017, Oncotarget.

[58]  Yong Deng,et al.  A novel feature selection method based on CFS in cancer recognition , 2012, 2012 IEEE 6th International Conference on Systems Biology (ISB).

[59]  John A. Swets,et al.  Signal Detection Theory and ROC Analysis in Psychology and Diagnostics: Collected Papers , 1996 .

[60]  Chi Zhang,et al.  Less is more: Avoiding the LIBS dimensionality curse through judicious feature selection for explosive detection , 2015, Scientific Reports.

[61]  R. Tibshirani,et al.  Diagnosis of multiple cancer types by shrunken centroids of gene expression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[62]  C. Andersen,et al.  Putting a brake on stress signaling: miR-625-3p as a biomarker for choice of therapy in colorectal cancer. , 2016, Epigenomics.

[63]  E. George,et al.  Genetic variation in two proteins of the endocannabinoid system and their influence on body mass index and metabolism under low fat diet. , 2007, Hormone and metabolic research = Hormon- und Stoffwechselforschung = Hormones et metabolisme.

[64]  Martin J. Hessner,et al.  Transcriptional Signatures as a Disease-Specific and Predictive Inflammatory Biomarker for Type 1 Diabetes , 2012, Genes and Immunity.

[65]  M. van Engeland,et al.  Prognostic DNA methylation markers for renal cell carcinoma: a systematic review. , 2017, Epigenomics.

[66]  Jinfeng Liu,et al.  Phosphorylation and linear ubiquitin direct A20 inhibition of inflammation , 2015, Nature.

[67]  Huan-Jun Liu,et al.  Predicting novel salivary biomarkers for the detection of pancreatic cancer using biological feature-based classification. , 2017, Pathology, research and practice.

[68]  Robin M. Murray,et al.  An integrated genetic-epigenetic analysis of schizophrenia: evidence for co-localization of genetic associations and differential DNA methylation , 2016, Genome Biology.

[69]  J. Leza,et al.  Peripheral Endocannabinoid System Dysregulation in First-Episode Psychosis , 2013, Neuropsychopharmacology.

[70]  Daniel A. Braun,et al.  Occam's Razor in sensorimotor learning , 2013, Proceedings of the Royal Society B: Biological Sciences.

[71]  M. Esteller,et al.  DNA methylation in early neoplasia. , 2010, Cancer biomarkers : section A of Disease markers.

[72]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[73]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[74]  José M Ferro,et al.  TTC7B Emerges as a Novel Risk Factor for Ischemic Stroke Through the Convergence of Several Genome-Wide Approaches , 2012, Journal of cerebral blood flow and metabolism : official journal of the International Society of Cerebral Blood Flow and Metabolism.

[75]  Guifang Shao,et al.  A new SVM-RFE approach towards ranking problem , 2009, 2009 IEEE International Conference on Intelligent Computing and Intelligent Systems.

[76]  Beat Pfister,et al.  Convex approximation of the NP-hard search problem in feature subset selection , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.