Network Analysis of Differential Expression for the Identification of Disease-Causing Genes

Genetic studies (in particular linkage and association studies) identify chromosomal regions involved in a disease or phenotype of interest, but those regions often contain many candidate genes, only a few of which can be followed-up for biological validation. Recently, computational methods to identify (prioritize) the most promising candidates within a region have been proposed, but they are usually not applicable to cases where little is known about the phenotype (no or few confirmed disease genes, fragmentary understanding of the biological cascades involved). We seek to overcome this limitation by replacing knowledge about the biological process by experimental data on differential gene expression between affected and healthy individuals. Considering the problem from the perspective of a gene/protein network, we assess a candidate gene by considering the level of differential expression in its neighborhood under the assumption that strong candidates will tend to be surrounded by differentially expressed neighbors. We define a notion of soft neighborhood where each gene is given a contributing weight, which decreases with the distance from the candidate gene on the protein network. To account for multiple paths between genes, we define the distance using the Laplacian exponential diffusion kernel. We score candidates by aggregating the differential expression of neighbors weighted as a function of distance. Through a randomization procedure, we rank candidates by p-values. We illustrate our approach on four monogenic diseases and successfully prioritize the known disease causing genes.

[1]  M. Yokoyama,et al.  The mouse homolog of Drosophila Vasa is required for the development of male germ cells. , 2000, Genes & development.

[2]  François Fouss,et al.  An Experimental Investigation of Graph Kernels on a Collaborative Recommendation Task , 2006, Sixth International Conference on Data Mining (ICDM'06).

[3]  G. Lathrop,et al.  Associations of distinct variants of the intestinal mucin gene MUC3A with ulcerative colitis and Crohn's disease , 2001, Journal of Human Genetics.

[4]  B. Shneiderman,et al.  Nuclear envelope dystrophies show a transcriptional fingerprint suggesting disruption of Rb-MyoD pathways in muscle regeneration. , 2006, Brain : a journal of neurology.

[5]  Michael P Boyle,et al.  Respiratory epithelial gene expression in patients with mild and severe cystic fibrosis lung disease. , 2006, American journal of respiratory cell and molecular biology.

[6]  L. Bubendorf High–Throughput Microarray Technologies: From Genomics to Clinics , 2001, European Urology.

[7]  K. N. Chandrika,et al.  Analysis of the human protein interactome and comparison with yeast, worm and fly interaction datasets , 2006, Nature Genetics.

[8]  Stephen T Warren,et al.  Genome-wide expression profiling of lymphoblastoid cell lines distinguishes different forms of autism and reveals shared pathways. , 2007, Human molecular genetics.

[9]  H. Ropers,et al.  High prevalence of SLC6A8 deficiency in X-linked mental retardation. , 2004, American journal of human genetics.

[10]  Risi Kondor,et al.  Diffusion kernels on graphs and other discrete structures , 2002, ICML 2002.

[11]  L. Tsui,et al.  Identification of the cystic fibrosis gene: cloning and characterization of complementary DNA. , 1989, Science.

[12]  Karin Noaksson,et al.  Increased levels of mucins in the cystic fibrosis mouse small intestine, and modulator effects of the Muc1 mucin expression. , 2006, American journal of physiology. Gastrointestinal and liver physiology.

[13]  Y. Sakaki,et al.  Endogenous siRNAs from naturally formed dsRNAs regulate transcripts in mouse oocytes , 2008, Nature.

[14]  Michael I. Jordan,et al.  Kernel independent component analysis , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[15]  M. Urbanek,et al.  Candidate gene region for polycystic ovary syndrome on chromosome 19p13.2. , 2005, The Journal of clinical endocrinology and metabolism.

[16]  Gerd Schmitz,et al.  Aberrant intestinal expression and allelic variants of mucin genes associated with inflammatory bowel disease , 2006, Journal of Molecular Medicine.

[17]  B. Destenaves,et al.  disorders: Part I: polycystic ovary syndrome , 2008 .

[18]  Michael I. Jordan,et al.  Predictive low-rank decomposition for kernel methods , 2005, ICML.

[19]  Andy M. Yip,et al.  Gene network interconnectedness and the generalized topological overlap measure , 2007, BMC Bioinformatics.

[20]  Ho-Joon Lee,et al.  Oocyte Generation in Adult Mammalian Ovaries by Putative Germ Cells in Bone Marrow and Peripheral Blood , 2005, Cell.

[21]  Bassem A. Hassan,et al.  Gene prioritization through genomic data fusion , 2006, Nature Biotechnology.

[22]  M. Romero,et al.  The SLC26 gene family of multifunctional anion exchangers , 2004, Pflügers Archiv.

[23]  Haifan Lin,et al.  A novel class of evolutionarily conserved genes defined by piwi are essential for stem cell self-renewal. , 1998, Genes & development.

[24]  Z. Ou,et al.  Chromosomal microarray analysis (CMA) detects a large X chromosome deletion including FMR1, FMR2, and IDS in a female patient with mental retardation , 2007, American journal of medical genetics. Part A.

[25]  T. Ideker,et al.  Network-based classification of breast cancer metastasis , 2007, Molecular systems biology.

[26]  Christian von Mering,et al.  STRING 7—recent developments in the integration and prediction of protein interactions , 2006, Nucleic Acids Res..

[27]  Katya Scheinberg,et al.  Efficient SVM Training Using Low-Rank Kernel Representations , 2002, J. Mach. Learn. Res..

[28]  Dennis B. Troup,et al.  NCBI GEO: mining tens of millions of expression profiles—database and tools update , 2006, Nucleic Acids Res..

[29]  Paolo Sassone-Corsi,et al.  The chromatoid body of male germ cells: similarity with processing bodies and presence of Dicer and microRNA pathway components. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[30]  P. Robinson,et al.  Walking the interactome for prioritization of candidate disease genes. , 2008, American journal of human genetics.

[31]  Sorin Drăghici,et al.  Data Analysis Tools for DNA Microarrays , 2003 .

[32]  H. Zoghbi,et al.  MeCP2 dysfunction in Rett syndrome and related disorders. , 2006, Current opinion in genetics & development.

[33]  Helen E. Parkinson,et al.  ArrayExpress—a public database of microarray experiments and gene expression profiles , 2006, Nucleic Acids Res..

[34]  Yoichi Matsuda,et al.  Mili, a mammalian member of piwi family gene, is essential for spermatogenesis , 2004, Development.

[35]  G. Watts,et al.  Polymorphism of the follistatin gene in polycystic ovary syndrome. , 2007, Molecular human reproduction.

[36]  Uta Francke,et al.  A Marfan syndrome gene expression phenotype in cultured skin fibroblasts , 2007, BMC Genomics.

[37]  Alberto Benguría,et al.  Differential gene expression profile in omental adipose tissue in women with polycystic ovary syndrome. , 2007, The Journal of clinical endocrinology and metabolism.

[38]  N. Tsunekawa,et al.  Vasa homolog genes in mammalian germ cell development. , 2001, Cell structure and function.

[39]  N. Tsunekawa,et al.  Expression of the mouse Aven gene during spermatogenesis, analyzed by subtraction screening using Mvh-knockout mice. , 2003, Gene expression patterns : GEP.

[40]  Pall I. Olason,et al.  A human phenome-interactome network of protein complexes implicated in genetic disorders , 2007, Nature Biotechnology.

[41]  M. Urbanek,et al.  Identification of a polycystic ovary syndrome susceptibility variant in fibrillin-3 and association with a metabolic phenotype. , 2008, The Journal of clinical endocrinology and metabolism.

[42]  P. Becker NEUE ERGEBNISSE DER GENETIK DER MUSKELDYSTROPHIEN , 1957 .

[43]  C. Wijmenga,et al.  Reconstruction of a functional human gene network, with an application for prioritizing positional candidate genes. , 2006, American journal of human genetics.