A similarity-based method for genome-wide prediction of disease-relevant human genes

MOTIVATION A method for prediction of disease relevant human genes from the phenotypic appearance of a query disease is presented. Diseases of known genetic origin are clustered according to their phenotypic similarity. Each cluster entry consists of a disease and its underlying disease gene. Potential disease genes from the human genome are scored by their functional similarity to known disease genes in these clusters, which are phenotypically similar to the query disease. RESULTS For assessment of the approach, a leave-one-out cross-validation of 878 diseases from the OMIM database, using 10672 candidate genes from the human genome, is performed. Depending on the applied parameters, in roughly one-third of cases the true solution is contained within the top scoring 3% of predictions and in two-third of cases the true solution is contained within the top scoring 15% of predictions. The prediction results can either be used to identify target genes, when searching for a mutation in monogenic diseases or for selection of loci in genotyping experiments in genetically complex diseases.

[1]  M. Goldstein,et al.  Analysis of Gene Expression Data , 2022 .

[2]  J. Rashbass Online Mendelian Inheritance in Man. , 1995, Trends in genetics : TIG.

[3]  S. Amladi,et al.  Online Mendelian Inheritance in Man 'OMIM'. , 2003, Indian journal of dermatology, venereology and leprology.

[4]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[5]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[6]  M J Khoury,et al.  The future of genetic studies of complex human diseases: an epidemiologic perspective. , 1998, Epidemiology.

[7]  Thomas Lengauer,et al.  Analysis of Gene Expression Data with Pathway Scores , 2000, ISMB.

[8]  P. Bork,et al.  Association of genes to genetically inherited diseases using data mining , 2002, Nature Genetics.

[9]  L. Wu,et al.  An Automated Computer System to Support Ultra High Throughput SNP Genotyping , 2001, Pacific Symposium on Biocomputing.

[10]  D. Mccormick Sequence the Human Genome , 1986, Bio/Technology.

[11]  David Valle,et al.  Human disease genes , 2001, Nature.

[12]  Rolf Apweiler,et al.  Proteome Analysis Database: online application of InterPro and CluSTr for the functional classification of proteins in whole genomes , 2001, Nucleic Acids Res..

[13]  E. Lander,et al.  Genetic dissection of complex traits: guidelines for interpreting and reporting linkage results , 1995, Nature Genetics.

[14]  N Risch,et al.  The Future of Genetic Studies of Complex Human Diseases , 1996, Science.