A new web-based data mining tool for the identification of candidate genes for human genetic disorders

To identify the gene underlying a human genetic disorder can be difficult and time-consuming. Typically, positional data delimit a chromosomal region that contains between 20 and 200 genes. The choice then lies between sequencing large numbers of genes, or setting priorities by combining positional data with available expression and phenotype data, contained in different internet databases. This process of examining positional candidates for possible functional clues may be performed in many different ways, depending on the investigator's knowledge and experience. Here, we report on a new tool called the GeneSeeker, which gathers and combines positional data and expression/phenotypic data in an automated way from nine different web-based databases. This results in a quick overview of interesting candidate genes in the region of interest. The GeneSeeker system is built in a modular fashion allowing for easy addition or removal of databases if required. Databases are searched directly through the web, which obviates the need for data warehousing. In order to evaluate the GeneSeeker tool, we analysed syndromes with known genesis. For each of 10 syndromes the GeneSeeker programme generated a shortlist that contained a significantly reduced number of candidate genes from the critical region, yet still contained the causative gene. On average, a list of 163 genes based on position alone was reduced to a more manageable list of 22 genes based on position and expression or phenotype information. We are currently expanding the tool by adding other databases. The GeneSeeker is available via the web-interface (http://www.cmbi.kun.nl/GeneSeeker/).

[1]  B. Perry,et al.  Hand-foot-uterus syndrome. , 1970, Lancet.

[2]  The hand-food-uterus syndrome: a new hereditary disorder characterized by hand and foot dysplasia, dermatoglyphic abnormalities, and partial duplication of the female genital tract. , 1970, The Journal of pediatrics.

[3]  D. Mccormick Sequence the Human Genome , 1986, Bio/Technology.

[4]  K. Zerres,et al.  ADULT-syndrome: an autosomal-dominant disorder with pigment anomalies, ectrodactyly, nail dysplasia, and hypodontia. , 1993, American journal of medical genetics.

[5]  John S. Wassom,et al.  TBASE: a computerized database for transgenic animals and targeted mutations , 1993, Nature.

[6]  E. Mariman,et al.  Mapping a gene for Noonan syndrome to the long arm of chromosome 12 , 1994, Nature Genetics.

[7]  P. Deloukas,et al.  A Gene Map of the Human Genome , 1996, Science.

[8]  G. Borsani,et al.  Identification and mapping of human cDNAs homologous to Drosophila mutant genes through EST database searching , 1996, Nature Genetics.

[9]  T. Dryja,et al.  Gene-based approach to human gene-phenotype correlations. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Stanley Letovsky,et al.  GDB: the Human Genome Database , 1998, Nucleic Acids Res..

[11]  R. Winter,et al.  Internet databases for clinical geneticists ‐ an overview 1 , 1998, Clinical genetics.

[12]  J. Celli,et al.  Probing the Gene eXpression Database for candidate genes , 1999, European Journal of Human Genetics.

[13]  A. Bairoch,et al.  The SWISS-PROT protein sequence data bank and its supplement TrEMBL in 1999 , 1999, Nucleic Acids Res..

[14]  A. Deutman,et al.  Isolation and mapping of novel candidate genes for retinal disorders using suppression subtractive hybridization. , 1999, Genomics.

[15]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 , 2000, Nucleic Acids Res..

[16]  K. Demyttenaere,et al.  Predictive testing for hereditary breast and ovarian cancer: a psychological framework for pre-test counselling , 2000, European Journal of Human Genetics.

[17]  M. Patton,et al.  Recessive Robinow syndrome, allelic to dominant brachydactyly type B, is caused by mutation of ROR2 , 2000, Nature Genetics.

[18]  Han G. Brunner,et al.  Mutation of the gene encoding the ROR2 tyrosine kinase causes autosomal recessive Robinow syndrome , 2000, Nature Genetics.

[19]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[20]  Judith A. Blake,et al.  The Mouse Genome Database (MGD): integration nexus for the laboratory mouse , 2001, Nucleic Acids Res..

[21]  Janan T. Eppig,et al.  The Mouse Gene Expression Database (GXD) , 2001, Nucleic Acids Res..

[22]  Michael A. Patton,et al.  Mutations in PTPN11, encoding the protein tyrosine phosphatase SHP-2, cause Noonan syndrome , 2001, Nature Genetics.

[23]  Richard Baldock,et al.  The Mouse Atlas Database: a community resource for mouse development , 2001 .

[24]  Andreas D. Baxevanis,et al.  GeneMachine: gene prediction and sequence annotation , 2001, Bioinform..

[25]  A. Munnich,et al.  TP63 gene mutation in ADULT syndrome , 2001, European Journal of Human Genetics.

[26]  Seth Blackshaw,et al.  Comprehensive Analysis of Photoreceptor Gene Expression and the Identification of Candidate Retinal Disease Genes , 2001, Cell.

[27]  Thomas J. Liesegang,et al.  The sequence of the human genome. Venter JC,∗ Adams MD, Myers EW, et al. Science 2001;291:1304–1351. , 2001 .

[28]  Susumu Goto,et al.  The KEGG databases at GenomeNet , 2002, Nucleic Acids Res..