BioGranat-IG: a network analysis tool to suggest mechanisms of genetic heterogeneity from exome-sequencing data

MOTIVATION Recent exome-sequencing studies have successfully identified disease-causing sequence variants for several rare monogenic diseases by examining variants common to a group of patients. However, the current data analysis strategies are only insufficiently able to deal with confounding factors such as genetic heterogeneity, incomplete penetrance, individuals lacking data and involvement of several genes. RESULTS We introduce BioGranat-IG, an analysis strategy that incorporates the information contained in biological networks to the analysis of exome-sequencing data. To identify genes that may have a disease-causing role, we label all nodes of the network according to the individuals that are carrying a sequence variant and subsequently identify small subnetworks linked to all or most individuals. Using simulated exome-sequencing data, we demonstrate that BioGranat-IG is able to recover the genes responsible for two diseases known to be caused by variants in an underlying complex. We also examine the performance of BioGranat-IG under various conditions likely to be faced by the user, and show that its network-based approach is more powerful than a set-cover-based approach.

[1]  Lodewyk F. A. Wessels,et al.  A Critical Evaluation of Network and Pathway-Based Classifiers for Outcome Prediction in Breast Cancer , 2011, PloS one.

[2]  Qing Zhao,et al.  A note on 'Algorithms for connected set cover problem and fault-tolerant connected set cover problem' , 2011, Theor. Comput. Sci..

[3]  E. Marcotte,et al.  Prioritizing candidate disease genes by network-based boosting of genome-wide association data. , 2011, Genome research.

[4]  T. Jenssen,et al.  A literature network of human genes for high-throughput analysis of gene expression , 2001, Nature Genetics.

[5]  Tobias Friedrich,et al.  Efficient key pathway mining: combining networks and OMICS data. , 2012, Integrative biology : quantitative biosciences from nano to macro.

[6]  Emily H Turner,et al.  Exome sequencing identifies MLL2 mutations as a cause of Kabuki syndrome , 2010, Nature Genetics.

[7]  Richard M. Karp,et al.  Reducibility Among Combinatorial Problems , 1972, 50 Years of Integer Programming.

[8]  J. Shendure,et al.  Massively parallel sequencing and rare disease. , 2010, Human molecular genetics.

[9]  S. Robertson,et al.  Mutations in NOTCH2 cause Hajdu-Cheney syndrome, a disorder of severe and progressive bone loss , 2011, Nature Genetics.

[10]  A. Barabasi,et al.  Network biology: understanding the cell's functional organization , 2004, Nature Reviews Genetics.

[11]  M. King,et al.  Genetic Heterogeneity in Human Disease , 2010, Cell.

[12]  Baoxi Wang,et al.  γ-Secretase Gene Mutations in Familial Acne Inversa , 2010, Science.

[13]  Gary D. Bader,et al.  The GeneMANIA prediction server: biological network integration for gene prioritization and predicting gene function , 2010, Nucleic Acids Res..

[14]  Thomas Schlitt,et al.  Protein-protein interaction databases: keeping up with growing interactomes , 2009, Human Genomics.

[15]  Richard M. Karp,et al.  DEGAS: De Novo Discovery of Dysregulated Pathways in Human Diseases , 2010, PloS one.

[16]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[17]  Raymond E. Miller,et al.  Complexity of Computer Computations , 1972 .

[18]  Damian Szklarczyk,et al.  The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored , 2010, Nucleic Acids Res..

[19]  M. Daly,et al.  Proteins Encoded in Genomic Regions Associated with Immune-Mediated Disease Physically Interact and Suggest Underlying Biology , 2011, PLoS genetics.

[20]  L. Schild,et al.  Cell surface expression of the epithelial Na channel and a mutant causing Liddle syndrome: a quantitative approach. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Weili Wu,et al.  Algorithms for connected set cover problem and fault-tolerant connected set cover problem , 2009, Theor. Comput. Sci..

[22]  Thomas Schlitt,et al.  Breaking free from the chains of pathway annotation: de novo pathway discovery for the analysis of disease processes. , 2012, Pharmacogenomics.

[23]  Yudi Pawitan,et al.  Revisiting Mendelian disorders through exome sequencing , 2011, Human Genetics.

[24]  Martin Ester,et al.  Inferring cancer subnetwork markers using density-constrained biclustering , 2010, Bioinform..

[25]  Emily H Turner,et al.  Targeted Capture and Massively Parallel Sequencing of Twelve Human Exomes , 2009, Nature.

[26]  Khaled M. Elbassioni,et al.  The relation of Connected Set Cover and Group Steiner Tree , 2012, Theor. Comput. Sci..

[27]  J. Orestes Cerdeira,et al.  Requiring Connectivity in the Set Covering Problem , 2005, J. Comb. Optim..

[28]  Soojin V Yi,et al.  Path lengths in protein–protein interaction networks and biological complexity , 2011, Proteomics.