Identifying network-based biomarkers of complex diseases from high-throughput data.

In this work, we review the main available computational methods of identifying biomarkers of complex diseases from high-throughput data. The emerging omics techniques provide powerful alternatives to measure thousands of molecules in cells in parallel manners. The generated genomic, transcriptomic, proteomic, metabolomic and phenomic data provide comprehensive molecular and cellular information for detecting critical signals served as biomarkers by classifying disease phenotypic states. Networks are often employed to organize these profiles in the identification of biomarkers to deal with complex diseases in diagnosis, prognosis and therapy as well as mechanism deciphering from systematic perspectives. Here, we summarize some representative network-based bioinformatics methods in order to highlight the importance of computational strategies in biomarker discovery.

[1]  J. Ott,et al.  Complement Factor H Polymorphism in Age-Related Macular Degeneration , 2005, Science.

[2]  A. Barabasi,et al.  Network medicine : a network-based approach to human disease , 2010 .

[3]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[4]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[5]  D. Hochstrasser,et al.  From Proteins to Proteomes: Large Scale Protein Identification by Two-Dimensional Electrophoresis and Arnino Acid Analysis , 1996, Bio/Technology.

[6]  Mick Watson,et al.  Errors in RNA-Seq quantification affect genes of relevance to human disease , 2015, Genome Biology.

[7]  J. Haerting,et al.  Gene-expression signatures in breast cancer. , 2003, The New England journal of medicine.

[8]  Bor-Sen Chen,et al.  Evolution of Network Biomarkers from Early to Late Stage Bladder Cancer Samples , 2014, BioMed research international.

[9]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[10]  M. Vidal,et al.  Edgetic perturbation models of human inherited disorders , 2009, Molecular systems biology.

[11]  David A. Orlando,et al.  Revisiting Global Gene Expression Analysis , 2012, Cell.

[12]  Ronald W. Davis,et al.  Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray , 1995, Science.

[13]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[14]  D. Koller,et al.  A module map showing conditional activity of expression modules in cancer , 2004, Nature Genetics.

[15]  Xiao Han,et al.  A computational procedure for identifying master regulator candidates: a case study on diabetes progression in Goto-Kakizaki rats , 2012, BMC Systems Biology.

[16]  Zhi-Ping Liu,et al.  Network screening of Goto-Kakizaki rat liver microarray data during diabetic progression , 2011, BMC Systems Biology.

[17]  Canglin Wu,et al.  RegNetwork: an integrated database of transcriptional and post-transcriptional regulatory networks in human and mouse , 2015, Database J. Biol. Databases Curation.

[18]  C. von Mering,et al.  PaxDb, a Database of Protein Abundance Averages Across All Three Domains of Life , 2012, Molecular & Cellular Proteomics.

[19]  M. West,et al.  Gene expression predictors of breast cancer outcomes , 2003, The Lancet.

[20]  Zhi-Ping Liu,et al.  Gaussian graphical model for identifying significantly responsive regulatory networks from time course high-throughput data. , 2013, IET systems biology.

[21]  Sandhya Rani,et al.  Human Protein Reference Database—2009 update , 2008, Nucleic Acids Res..

[22]  Peggy Hall,et al.  The NHGRI GWAS Catalog, a curated resource of SNP-trait associations , 2013, Nucleic Acids Res..

[23]  Chen Shao,et al.  A Tool for Biomarker Discovery in the Urinary Proteome: A Manually Curated Human and Animal Urine Protein Biomarker Database* , 2011, Molecular & Cellular Proteomics.

[24]  Gary D. Bader,et al.  Pathway Commons, a web resource for biological pathway data , 2010, Nucleic Acids Res..

[25]  C. Croce,et al.  MicroRNA signatures in human cancers , 2006, Nature Reviews Cancer.

[26]  E. Birney,et al.  Reactome: a knowledgebase of biological pathways , 2004, Nucleic Acids Research.

[27]  Yong Wang,et al.  Spatio-temporal analysis of type 2 diabetes mellitus based on differential expression networks , 2013, Scientific Reports.

[28]  Jannik N. Andersen,et al.  Cancer genomics: from discovery science to personalized medicine , 2011, Nature Medicine.

[29]  Xiang-Sun Zhang,et al.  De novo prediction of RNA-protein interactions from sequence information. , 2013, Molecular bioSystems.

[30]  E. Lundberg,et al.  Towards a knowledge-based Human Protein Atlas , 2010, Nature Biotechnology.

[31]  Rui Liu,et al.  Edge biomarkers for classification and prediction of phenotypes , 2014, Science China Life Sciences.

[32]  Jian Zhu,et al.  Systematic identification of transcriptional and post-transcriptional regulations in human respiratory epithelial cells during influenza A virus infection , 2014, BMC Bioinformatics.

[33]  Elias Campo Guerri,et al.  International network of cancer genome projects , 2010 .

[34]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[35]  C. Sawyers The cancer biomarker problem , 2008, Nature.

[36]  M. Vidal,et al.  Edgetic perturbation of a C. elegans BCL2 ortholog , 2009, Nature Methods.

[37]  Christian von Mering,et al.  STRING 7—recent developments in the integration and prediction of protein interactions , 2006, Nucleic Acids Res..

[38]  Xing-Ming Zhao,et al.  Identifying disease genes and module biomarkers by differential interactions , 2012, J. Am. Medical Informatics Assoc..

[39]  R. Tibshirani,et al.  Disease signatures are robust across tissues and experiments , 2009, Molecular systems biology.

[40]  João Ricardo Sato,et al.  Comparing Pearson, Spearman and Hoeffding's d Measure for Gene Expression Association Analysis , 2009, J. Bioinform. Comput. Biol..

[41]  P. Park ChIP–seq: advantages and challenges of a maturing technology , 2009, Nature Reviews Genetics.

[42]  Anneleen Daemen,et al.  Metabolite profiling stratifies pancreatic ductal adenocarcinomas into subtypes with distinct sensitivities to metabolic inhibitors , 2015, Proceedings of the National Academy of Sciences.

[43]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[44]  Kazuyuki Aihara,et al.  Detecting early-warning signals for sudden deterioration of complex diseases by dynamical network biomarkers , 2012, Scientific Reports.

[45]  R. Hargreaves,et al.  Clinical biomarkers in drug discovery and development , 2003, Nature Reviews Drug Discovery.

[46]  Hailong Zhu,et al.  Network biomarkers reveal dysfunctional gene regulations during disease progression , 2013, The FEBS journal.

[47]  T. Ideker,et al.  Network-based classification of breast cancer metastasis , 2007, Molecular systems biology.

[48]  Xiang-Sun Zhang,et al.  Detecting and analyzing differentially activated pathways in brain regions of Alzheimer's disease patients. , 2011, Molecular bioSystems.

[49]  Lihua Liu,et al.  TRED: a Transcriptional Regulatory Element Database and a platform for in silico gene regulation studies , 2004, Nucleic Acids Res..

[50]  Wanwei Zhang,et al.  EdgeMarker: Identifying differentially correlated molecule pairs as edge-biomarkers. , 2014, Journal of theoretical biology.

[51]  Luonan Chen,et al.  Identification of dysfunctional modules and disease genes in congenital heart disease by a network-based approach , 2011, BMC Genomics.

[52]  H. Horvitz,et al.  MicroRNA expression profiles classify human cancers , 2005, Nature.

[53]  Bor-Sen Chen,et al.  Core and specific network markers of carcinogenesis from multiple cancer samples. , 2014, Journal of theoretical biology.

[54]  Yudong D. He,et al.  A Gene-Expression Signature as a Predictor of Survival in Breast Cancer , 2002 .

[55]  P. Legrain,et al.  Toward a functional analysis of the yeast genome through exhaustive two-hybrid screens , 1997, Nature Genetics.

[56]  M. Vidal,et al.  Edgotype: a fundamental link between genotype and phenotype. , 2013, Current opinion in genetics & development.

[57]  Student,et al.  THE PROBABLE ERROR OF A MEAN , 1908 .

[58]  Yan Zhang,et al.  Research and applications: An integrated approach to identify causal network modules of complex diseases with application to colorectal cancer , 2013, J. Am. Medical Informatics Assoc..

[59]  Zhi-Ping Liu,et al.  Dynamically dysfunctional protein interactions in the development of Alzheimer's disease , 2009, 2009 IEEE International Conference on Systems, Man and Cybernetics.

[60]  안성민 Development of Personalized Tumor Biomarkers Using Massively Parallel Sequencing , 2011 .

[61]  Johan Trygg,et al.  High-throughput data analysis for detecting and identifying differences between samples in GC/MS-based metabolomic analyses. , 2005, Analytical chemistry.

[62]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[63]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[64]  Pak Chung Sham,et al.  GWASdb: a database for human genetic variants identified by genome-wide association studies , 2011, Nucleic Acids Res..

[65]  H. Chernoff,et al.  Why significant variables aren’t automatically good predictors , 2015, Proceedings of the National Academy of Sciences.

[66]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[67]  Zhuowen Tu,et al.  Similarity network fusion for aggregating data types on a genomic scale , 2014, Nature Methods.

[68]  Xing-Ming Zhao,et al.  Inferring gene regulatory networks from gene expression data by path consistency algorithm based on conditional mutual information , 2012, Bioinform..

[69]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[70]  M. Muers,et al.  Functional genomics: The modENCODE guide to the genome , 2011, Nature Reviews Genetics.

[71]  D. Koller,et al.  The Immunological Genome Project: networks of gene expression in immune cells , 2008, Nature Immunology.

[72]  中尾 光輝,et al.  KEGG(Kyoto Encyclopedia of Genes and Genomes)〔和文〕 (特集 ゲノム医学の現在と未来--基礎と臨床) -- (データベース) , 2000 .

[73]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[74]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[75]  Sergio Contrino,et al.  ArrayExpress—a public repository for microarray gene expression data at the EBI , 2004, Nucleic Acids Res..

[76]  L. Kamolz,et al.  The Angelina effect revisited: Exploring a media‐related impact on public awareness , 2015, Cancer.

[77]  K. Sirotkin,et al.  The NCBI dbGaP database of genotypes and phenotypes , 2007, Nature Genetics.

[78]  R. Sharan,et al.  Protein networks in disease. , 2008, Genome research.

[79]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[80]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[81]  Shuang Wu,et al.  More powerful significant testing for time course gene expression data using functional principal component analysis approaches , 2013, BMC Bioinformatics.

[82]  Ping Liu,et al.  Can Serum Glypican-3 Be a Biomarker for Effective Diagnosis of Hepatocellular Carcinoma? A Meta-Analysis of the Literature , 2014, Disease markers.

[83]  A. Bird,et al.  Epigenetic regulation of gene expression: how the genome integrates intrinsic and environmental signals , 2003, Nature Genetics.

[84]  Doheon Lee,et al.  Inferring Pathway Activity toward Precise Disease Classification , 2008, PLoS Comput. Biol..

[85]  Michael I. Jordan,et al.  Machine learning: Trends, perspectives, and prospects , 2015, Science.

[86]  W. Huber,et al.  Differential expression analysis for sequence count data , 2010 .

[87]  H. Akaike A new look at the statistical model identification , 1974 .

[88]  Sourav Bandyopadhyay,et al.  Rewiring of Genetic Networks in Response to DNA Damage , 2010, Science.

[89]  David S. Wishart,et al.  DrugBank: a comprehensive resource for in silico drug discovery and exploration , 2005, Nucleic Acids Res..

[90]  S. Resnick Preclinical biomarkers in Alzheimer disease: a sum greater than the parts. , 2014, JAMA neurology.

[91]  Mike Tyers,et al.  BioGRID: a general repository for interaction datasets , 2005, Nucleic Acids Res..

[92]  J. Castle,et al.  An integrative genomics approach to infer causal associations between gene expression and disease , 2005, Nature Genetics.

[93]  R. Strausberg,et al.  The cancer genome anatomy project: building an annotated gene index. , 2000, Trends in genetics : TIG.

[94]  Rafael C. Jimenez,et al.  The IntAct molecular interaction database in 2012 , 2011, Nucleic Acids Res..

[95]  T. Ideker,et al.  Subnetwork-based analysis of chronic lymphocytic leukemia identifies pathways that associate with disease progression. , 2011, Blood.

[96]  Kazuyuki Aihara,et al.  Identifying critical transitions and their leading biomolecular networks in complex diseases , 2012, Scientific Reports.

[97]  J. Derisi,et al.  Single-cell proteomic analysis of S. cerevisiae reveals the architecture of biological noise , 2006, Nature.

[98]  A. Gelman Analysis of variance: Why it is more important than ever? , 2005, math/0504499.

[99]  Bassem A. Hassan,et al.  Gene prioritization through genomic data fusion , 2006, Nature Biotechnology.

[100]  Teri A Manolio,et al.  Genomewide association studies and assessment of the risk of disease. , 2010, The New England journal of medicine.

[101]  Andrew M. Gross,et al.  Network-based stratification of tumor mutations , 2013, Nature Methods.

[102]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[103]  J. Aronson Biomarkers and surrogate endpoints. , 2005, British journal of clinical pharmacology.

[104]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[105]  J. Kinney,et al.  Equitability, mutual information, and the maximal information coefficient , 2013, Proceedings of the National Academy of Sciences.

[106]  I. Simon,et al.  Studying and modelling dynamic biological processes using time-series gene expression data , 2012, Nature Reviews Genetics.

[107]  V. Brower Biomarkers: Portents of malignancy , 2011, Nature.

[108]  Zhi-Ping Liu,et al.  Identifying module biomarker in type 2 diabetes mellitus by discriminative area of functional activity , 2015, BMC Bioinformatics.

[109]  Joshua M. Korn,et al.  Comprehensive genomic characterization defines human glioblastoma genes and core pathways , 2008, Nature.

[110]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[111]  A. Barabasi,et al.  Network biology: understanding the cell's functional organization , 2004, Nature Reviews Genetics.

[112]  István A. Kovács,et al.  Widespread Macromolecular Interaction Perturbations in Human Genetic Disorders , 2015, Cell.

[113]  Yu-Chao Wang,et al.  A network-based biomarker approach for molecular investigation and diagnosis of lung cancer , 2011, BMC Medical Genomics.

[114]  J. Weinstein,et al.  Biomarkers in Cancer Staging, Prognosis and Treatment Selection , 2005, Nature Reviews Cancer.

[115]  Ara Darzi,et al.  Preparing for precision medicine. , 2012, The New England journal of medicine.

[116]  Xuefei Shi,et al.  Long non-coding RNAs: a new frontier in the study of human diseases. , 2013, Cancer letters.

[117]  S. Riaz Study of Protein Biomarkers of Diabetes Mellitus Type 2 and Therapy with Vitamin B1 , 2015, Journal of diabetes research.

[118]  Zhijin Wu,et al.  Preprocessing of oligonucleotide array data , 2004, Nature Biotechnology.

[119]  François Schiettecatte,et al.  OMIM.org: Online Mendelian Inheritance in Man (OMIM®), an online catalog of human genes and genetic disorders , 2014, Nucleic Acids Res..

[120]  Andrey Rzhetsky,et al.  DiseaseConnect: a comprehensive web server for mechanism-based disease–disease connections , 2014, Nucleic Acids Res..

[121]  Xing-Ming Zhao,et al.  Identifying dysregulated pathways in cancers from pathway interaction networks , 2012, BMC Bioinformatics.

[122]  Euan A Ashley,et al.  The precision medicine initiative: a new national effort. , 2015, JAMA.

[123]  P. Park,et al.  Discovering statistically significant pathways in expression profiling studies. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[124]  David Warde-Farley,et al.  Dynamic modularity in protein interaction networks predicts breast cancer outcome , 2009, Nature Biotechnology.

[125]  S. Hanash,et al.  BiomarkerDigger: A versatile disease proteome database and analysis platform for the identification of plasma cancer biomarkers , 2009, Proteomics.

[126]  Zhi-Ping Liu,et al.  Identifying dysfunctional crosstalk of pathways in various regions of Alzheimer's disease brains , 2010, BMC Systems Biology.

[127]  Luonan Chen,et al.  Coexpression network analysis in chronic hepatitis B and C hepatic lesions reveals distinct patterns of disease progression to hepatocellular carcinoma. , 2012, Journal of molecular cell biology.

[128]  Ying Zhang,et al.  HMDB: the Human Metabolome Database , 2007, Nucleic Acids Res..

[129]  Doron Lancet,et al.  MOPED: Model Organism Protein Expression Database , 2011, Nucleic Acids Res..

[130]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[131]  T. Mikkelsen,et al.  The NIH Roadmap Epigenomics Mapping Consortium , 2010, Nature Biotechnology.

[132]  R. Tibshirani,et al.  On testing the significance of sets of genes , 2006, math/0610667.

[133]  S. Horvath,et al.  Statistical Applications in Genetics and Molecular Biology , 2011 .

[134]  Leroy Hood,et al.  Systems Biology and P4 Medicine: Past, Present, and Future , 2013, Rambam Maimonides medical journal.

[135]  A. Butte,et al.  Leveraging models of cell regulation and GWAS data in integrative network-based association studies , 2012, Nature Genetics.

[136]  Lennart Martens,et al.  PRIDE: a public repository of protein and peptide identifications for the proteomics community , 2005, Nucleic Acids Res..

[137]  Zhiping Liu,et al.  Network-based analysis of complex diseases. , 2012, IET systems biology.

[138]  In Seok Yang,et al.  IDBD: Infectious Disease Biomarker Database , 2007, Nucleic Acids Res..

[139]  Steven A Carr,et al.  Protein biomarker discovery and validation: the long and uncertain path to clinical utility , 2006, Nature Biotechnology.

[140]  Cathy H. Wu,et al.  The Universal Protein Resource (UniProt) , 2004, Nucleic Acids Res..

[141]  Paul T. Groth,et al.  The ENCODE (ENCyclopedia Of DNA Elements) Project , 2004, Science.

[142]  T. Ghosh,et al.  Elucidating the Genotype–Phenotype Relationships and Network Perturbations of Human Shared and Specific Disease Genes from an Evolutionary Perspective , 2014, Genome biology and evolution.

[143]  Gal Yadid,et al.  Role of DNA Methylation in the Nucleus Accumbens in Incubation of Cocaine Craving , 2015, The Journal of Neuroscience.