A Machine Learning Pipeline for Discriminant Pathways Identification

Identifying the molecular pathways more prone to disruption during a pathological process is a key task in network medicine and, more in general, in systems biology. In this work we propose a pipeline that couples a machine learning solution for molecular profiling with a recent network comparison method. The pipeline can identify changes occurring between specific sub-modules of networks built in a case-control biomarker study, discriminating key groups of genes whose interactions are modified by an underlying condition. The proposal is independent from the classification algorithm used. Two applications on genomewide data are presented regarding children susceptibility to air pollution and early and late onset of Parkinson’s disease.

[1]  A. Barabasi,et al.  Network medicine : a network-based approach to human disease , 2010 .

[2]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[3]  J. Kaye,et al.  An aberrant protein complex in CSF as a biomarker of Alzheimer disease , 2008, Neurology.

[4]  Kathleen A. Boyle,et al.  Amyloid-beta peptide binds with heme to form a peroxidase: relationship to the cytopathologies of Alzheimer's disease. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[5]  F. Middleton,et al.  Transcriptional analysis of multiple brain regions in Parkinson's disease supports the involvement of specific protein processing, energy metabolism, and signaling pathways, and suggests novel disease mechanisms , 2005, American journal of medical genetics. Part B, Neuropsychiatric genetics : the official publication of the International Society of Psychiatric Genetics.

[6]  Paul Gissen,et al.  Homozygous loss-of-function mutations in the gene encoding the dopamine transporter are associated with infantile parkinsonism-dystonia. , 2009, The Journal of clinical investigation.

[7]  Cesare Furlanello,et al.  An introduction to spectral distances in networks , 2010, WIRN.

[8]  A. Verri,et al.  The l 1l 2 regularization framework unmasks the hypoxia signature hidden in the transcriptome of a set of heterogeneous neuroblastoma cell lines , 2009 .

[9]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[10]  W. Brown,et al.  Review: Cerebral microvascular pathology in ageing and neurodegeneration , 2011, Neuropathology and applied neurobiology.

[11]  Bing Zhang,et al.  WebGestalt: an integrated system for exploring gene sets in various biological contexts , 2005, Nucleic Acids Res..

[12]  Alexander S Mikhailov,et al.  Evolutionary reconstruction of networks. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[13]  S. Kawamoto,et al.  Disease-associated Mutations and Alternative Splicing Alter the Enzymatic and Motile Activity of Nonmuscle Myosins II-B and II-C* , 2005, Journal of Biological Chemistry.

[14]  Taylor J. Maxwell,et al.  A scan of chromosome 10 identifies a novel locus showing strong association with late-onset Alzheimer disease. , 2006, American journal of human genetics.

[15]  Lorenzo Rosasco,et al.  The l1-l2 regularization framework unmasks the hypoxia signature hidden in the transcriptome of a set of heterogeneous neuroblastoma cell lines , 2009, BMC Genomics.

[16]  Winnie S. Liang,et al.  Alzheimer's disease is associated with reduced expression of energy metabolism genes in posterior cingulate neurons , 2008, Proceedings of the National Academy of Sciences.

[17]  S. Strogatz Exploring complex networks , 2001, Nature.

[18]  Feng Q. He,et al.  Reverse engineering and verification of gene networks: principles, assumptions, and limitations of present methods and future perspectives. , 2009, Journal of biotechnology.

[19]  Alberto de la Fuente,et al.  Inferring Gene Networks: Dream or Nightmare? , 2009, Annals of the New York Academy of Sciences.

[20]  André Boorsma,et al.  Genomic analysis suggests higher susceptibility of children to air pollution. , 2008, Carcinogenesis.

[21]  G. Schellenberg,et al.  Lewy body pathology in late-onset familial Alzheimer's disease: a clinicopathological case series. , 2006, Journal of Alzheimer's disease : JAD.

[22]  S. Sorbi,et al.  SNPs in neurotrophin system genes and Alzheimer's disease in an Italian population. , 2008, Journal of Alzheimer's disease : JAD.

[23]  Shihua Li,et al.  Activation of Gene Transcription by Heat Shock Protein 27 May Contribute to Its Neuronal Protection* , 2009, The Journal of Biological Chemistry.

[24]  Winnie S. Liang,et al.  Neuronal gene expression in non-demented individuals with intermediate Alzheimer's Disease neuropathology , 2010, Neurobiology of Aging.

[25]  Chris Wiggins,et al.  ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context , 2004, BMC Bioinformatics.

[26]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[27]  Jiawei Han,et al.  SRDA: An Efficient Algorithm for Large-Scale Discriminant Analysis , 2008, IEEE Transactions on Knowledge and Data Engineering.

[28]  Kevin C. Dorff,et al.  The MicroArray Quality Control (MAQC)-II study of common practices for the development and validation of microarray-based predictive models , 2010, Nature Biotechnology.

[29]  M. Buchanan,et al.  Networks in cell biology , 2010 .

[30]  The Lewy body. , 2001, Journal of neurology, neurosurgery, and psychiatry.

[31]  Taylor J. Maxwell,et al.  A scan of chromosome 10 identifies a novel locus showing strong association with late-onset Alzheimer disease. , 2006, American journal of human genetics.

[32]  Wei Zhao,et al.  Weighted Gene Coexpression Network Analysis: State of the Art , 2010, Journal of biopharmaceutical statistics.

[33]  Lee Aaron Newberg,et al.  Exact Calculation of Distributions on Integers, with Application to Sequence Alignment , 2009, J. Comput. Biol..

[34]  Kathleen A. Boyle,et al.  Amyloid-β peptide binds with heme to form a peroxidase: Relationship to the cytopathologies of Alzheimer’s disease , 2006 .

[35]  Ilya Nemenman,et al.  Reconstruction of Metabolic Networks from High‐Throughput Metabolite Profiling Data , 2007, Annals of the New York Academy of Sciences.

[36]  M. Kirsch‐Volders,et al.  Genome-wide differential gene expression in children exposed to air pollution in the Czech Republic. , 2006, Mutation research.

[37]  T. Ideker,et al.  Modeling cellular machinery through biological network comparison , 2006, Nature Biotechnology.

[38]  Gianluca Bontempi,et al.  minet: A R/Bioconductor Package for Inferring Large Transcriptional Networks Using Mutual Information , 2008, BMC Bioinformatics.

[39]  Carsten Wiuf,et al.  Subnets of scale-free networks are not scale-free: sampling properties of networks. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[40]  Alessandro Verri,et al.  A Regularized Method for Selecting Nested Groups of Relevant Genes from Microarray Data , 2008, J. Comput. Biol..

[41]  Alexander S. Mikhailov,et al.  Erratum: Evolutionary reconstruction of networks [Phys. Rev. E 66, 046109 (2002)] , 2003 .

[42]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[43]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[44]  Yu Wang,et al.  Cerebrospinal fluid biomarkers for Parkinson disease diagnosis and progression , 2011, Annals of neurology.

[45]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[46]  Mark Newman,et al.  Networks: An Introduction , 2010 .

[47]  A. D. Nozdrachev,et al.  The role of defensins in the excitability of the peripheral vestibular system in the frog: Evidence for the presence of communication between the immune and nervous systems , 2007, Hearing Research.

[48]  J. Growdon,et al.  Molecular markers of early Parkinson's disease based on gene expression in blood , 2007, Proceedings of the National Academy of Sciences.

[49]  S. Horvath,et al.  Statistical Applications in Genetics and Molecular Biology , 2011 .

[50]  D. Floreano,et al.  Revealing strengths and weaknesses of methods for gene network inference , 2010, Proceedings of the National Academy of Sciences.

[51]  Susumu Goto,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 2000, Nucleic Acids Res..

[52]  M. Moschovi,et al.  Expression of Epidermal Growth Factor Receptor and HER-2 in Pediatric Embryonal Brain Tumors , 2010, Pediatric Neurosurgery.

[53]  Riet De Smet,et al.  Advantages and limitations of current network inference methods , 2010, Nature Reviews Microbiology.