A Machine Learning Pipeline for Identification of Discriminant Pathways

Identifying the molecular pathways more prone to disruption during a pathological process is a key task in network medicine and, more generally, in systems biology. This chapter describes a pipeline that couples a machine learning solution for molecular profiling with a recent network comparison method. The pipeline can identify changes occurring between specific sub-modules of networks built in a case-control biomarker study, discriminating key groups of genes whose interactions are modified by an underlying condition. Different algorithms can be chosen to implement the workflow steps. Three applications on genome-wide data are presented regarding the susceptibility of children to air pollution, and early and late onset of Parkinsonʼs and Alzheimerʼs diseases.

[1]  A. D. Nozdrachev,et al.  The role of defensins in the excitability of the peripheral vestibular system in the frog: Evidence for the presence of communication between the immune and nervous systems , 2007, Hearing Research.

[2]  D. Floreano,et al.  Revealing strengths and weaknesses of methods for gene network inference , 2010, Proceedings of the National Academy of Sciences.

[3]  A. Barabasi,et al.  Network medicine : a network-based approach to human disease , 2010 .

[4]  G. Schellenberg,et al.  Lewy body pathology in late-onset familial Alzheimer's disease: a clinicopathological case series. , 2006, Journal of Alzheimer's disease : JAD.

[5]  S. Sorbi,et al.  SNPs in neurotrophin system genes and Alzheimer's disease in an Italian population. , 2008, Journal of Alzheimer's disease : JAD.

[6]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[7]  Riet De Smet,et al.  Advantages and limitations of current network inference methods , 2010, Nature Reviews Microbiology.

[8]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[9]  Gianluca Bontempi,et al.  minet: A R/Bioconductor Package for Inferring Large Transcriptional Networks Using Mutual Information , 2008, BMC Bioinformatics.

[10]  Paul Gissen,et al.  Homozygous loss-of-function mutations in the gene encoding the dopamine transporter are associated with infantile parkinsonism-dystonia. , 2009, The Journal of clinical investigation.

[11]  Yu Wang,et al.  Cerebrospinal fluid biomarkers for Parkinson disease diagnosis and progression , 2011, Annals of neurology.

[12]  J. Growdon,et al.  Molecular markers of early Parkinson's disease based on gene expression in blood , 2007, Proceedings of the National Academy of Sciences.

[13]  S. Horvath,et al.  A General Framework for Weighted Gene Co-Expression Network Analysis , 2005, Statistical applications in genetics and molecular biology.

[14]  J. Kaye,et al.  An aberrant protein complex in CSF as a biomarker of Alzheimer disease , 2008, Neurology.

[15]  Lee Aaron Newberg,et al.  Exact Calculation of Distributions on Integers, with Application to Sequence Alignment , 2009, J. Comput. Biol..

[16]  Kathleen A. Boyle,et al.  Amyloid-β peptide binds with heme to form a peroxidase: Relationship to the cytopathologies of Alzheimer’s disease , 2006 .

[17]  Kevin C. Dorff,et al.  The MicroArray Quality Control (MAQC)-II study of common practices for the development and validation of microarray-based predictive models , 2010, Nature Biotechnology.

[18]  M. Buchanan,et al.  Networks in cell biology , 2010 .

[19]  Shihua Li,et al.  Activation of Gene Transcription by Heat Shock Protein 27 May Contribute to Its Neuronal Protection* , 2009, The Journal of Biological Chemistry.

[20]  S. Kawamoto,et al.  Disease-associated Mutations and Alternative Splicing Alter the Enzymatic and Motile Activity of Nonmuscle Myosins II-B and II-C* , 2005, Journal of Biological Chemistry.

[21]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[22]  Jiawei Han,et al.  SRDA: An Efficient Algorithm for Large-Scale Discriminant Analysis , 2008, IEEE Transactions on Knowledge and Data Engineering.

[23]  M. Moschovi,et al.  Expression of Epidermal Growth Factor Receptor and HER-2 in Pediatric Embryonal Brain Tumors , 2010, Pediatric Neurosurgery.

[24]  S. Strogatz Exploring complex networks , 2001, Nature.

[25]  Alberto de la Fuente,et al.  Inferring Gene Networks: Dream or Nightmare? , 2009, Annals of the New York Academy of Sciences.

[26]  Susumu Goto,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 2000, Nucleic Acids Res..

[27]  V. Latora,et al.  Complex networks: Structure and dynamics , 2006 .

[28]  F. Middleton,et al.  Transcriptional analysis of multiple brain regions in Parkinson's disease supports the involvement of specific protein processing, energy metabolism, and signaling pathways, and suggests novel disease mechanisms , 2005, American journal of medical genetics. Part B, Neuropsychiatric genetics : the official publication of the International Society of Psychiatric Genetics.

[29]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[30]  Cesare Furlanello,et al.  An introduction to spectral distances in networks , 2010, WIRN.

[31]  Bing Zhang,et al.  WebGestalt: an integrated system for exploring gene sets in various biological contexts , 2005, Nucleic Acids Res..

[32]  Lorenzo Rosasco,et al.  The l1-l2 regularization framework unmasks the hypoxia signature hidden in the transcriptome of a set of heterogeneous neuroblastoma cell lines , 2009, BMC Genomics.

[33]  Winnie S. Liang,et al.  Alzheimer's disease is associated with reduced expression of energy metabolism genes in posterior cingulate neurons , 2008, Proceedings of the National Academy of Sciences.

[34]  Taylor J. Maxwell,et al.  A scan of chromosome 10 identifies a novel locus showing strong association with late-onset Alzheimer disease. , 2006, American journal of human genetics.

[35]  André Boorsma,et al.  Genomic analysis suggests higher susceptibility of children to air pollution. , 2008, Carcinogenesis.

[36]  Winnie S. Liang,et al.  Neuronal gene expression in non-demented individuals with intermediate Alzheimer's Disease neuropathology , 2010, Neurobiology of Aging.

[37]  Chris Wiggins,et al.  ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context , 2004, BMC Bioinformatics.

[38]  Alexander S Mikhailov,et al.  Evolutionary reconstruction of networks. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[39]  Feng Q. He,et al.  Reverse engineering and verification of gene networks: principles, assumptions, and limitations of present methods and future perspectives. , 2009, Journal of biotechnology.

[40]  Ilya Nemenman,et al.  Reconstruction of Metabolic Networks from High‐Throughput Metabolite Profiling Data , 2007, Annals of the New York Academy of Sciences.

[41]  Alessandro Verri,et al.  A Regularized Method for Selecting Nested Groups of Relevant Genes from Microarray Data , 2008, J. Comput. Biol..

[42]  M. Kirsch‐Volders,et al.  Genome-wide differential gene expression in children exposed to air pollution in the Czech Republic. , 2006, Mutation research.

[43]  T. Ideker,et al.  Modeling cellular machinery through biological network comparison , 2006, Nature Biotechnology.

[44]  Mark Newman,et al.  Networks: An Introduction , 2010 .

[45]  Alexander S. Mikhailov,et al.  Erratum: Evolutionary reconstruction of networks [Phys. Rev. E 66, 046109 (2002)] , 2003 .

[46]  R. Sulkava,et al.  Vascular risk factors and dementia in the general population aged >85 years Prospective population-based study , 2010, Neurobiology of Aging.

[47]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[48]  Wei Zhao,et al.  Weighted Gene Coexpression Network Analysis: State of the Art , 2010, Journal of biopharmaceutical statistics.