Gene finding for the helical cytokines

MOTIVATION Gene finding remains an open problem well after the sequencing of the human genome. The low gene sensitivity of current methods is a problem for divergent protein families, because fairly accurate exon assemblies are required before sensitive fold recognition algorithms can be applied. This paper presents a new genomic threading algorithm which integrates the gene finding and fold recognition steps into a single process. The method is applicable to evolutionarily divergent protein families that have retained some trace of their common ancestry, number and phase of introns, sizes of exons and placement of structural elements on specific exons. Such conserved structural signals may be visible despite dramatic evolution of protein sequence. RESULTS The method is evaluated on the family of helical cytokines by cross-validation sensitivity analysis. The method has also been applied to all intergenic regions of the human genome, and an expression and cloning approach has been coupled with the predictions of the method. Two genes discovered by this method are discussed. SUPPLEMENTARY INFORMATION All data used and the results obtained in the cross-validation analysis are available at http://www.soi.city.ac.uk/~conklin/papers/GT/

[1]  Nils J. Nilsson,et al.  Artificial Intelligence , 1974, IFIP Congress.

[2]  John M. Hancock,et al.  PlantProm: a database of plant promoter sequences , 2003, Nucleic Acids Res..

[3]  C. Freeman,et al.  Annotation of the Human Genome by High-Throughput Sequence Analysis of Naturally Occurring Proteins , 2004 .

[4]  Patrick Henry Winston,et al.  Artificial intelligence (3rd ed.) , 1992 .

[5]  C. Burge,et al.  Computational inference of homologous gene structures in the human genome. , 2001, Genome research.

[6]  R. Guigó,et al.  An assessment of gene prediction accuracy in large DNA sequences. , 2000, Genome research.

[7]  J. Bazan,et al.  Structural design and molecular evolution of a cytokine receptor superfamily. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[8]  J. Dormand,et al.  A family of embedded Runge-Kutta formulae , 1980 .

[9]  Jia Liu,et al.  The TIGR rice genome annotation resource: annotating the rice genome and creating resources for plant biologists , 2003, Nucleic Acids Res..

[10]  Victor V. Solovyev,et al.  SpliceDB: database of canonical and non-canonical mammalian splice sites , 2001, Nucleic Acids Res..

[11]  R. Durbin,et al.  GeneWise and Genomewise. , 2004, Genome research.

[12]  W. Paul,et al.  Molecular phylogeny within type I cytokines and their cognate receptors. , 2003, Immunity.

[13]  S. Karlin,et al.  Prediction of complete gene structures in human genomic DNA. , 1997, Journal of molecular biology.

[14]  A. McDonald,et al.  Prokaryotic orthologues of mitochondrial alternative oxidase and plastid terminal oxidase , 2003, Plant Molecular Biology.

[15]  Y. Nagai,et al.  Characterizing CGI‐94 (comparative gene identification‐94) which is down‐regulated in the hippocampus of early stage Alzheimer's disease brain , 2002, The European journal of neuroscience.

[16]  Iris Meier,et al.  A proteomic study of the arabidopsis nuclear matrix , 2003, Journal of cellular biochemistry.

[17]  P. Rouzé,et al.  Annotation of a 95-kb Populus deltoides genomic sequence reveals a disease resistance gene cluster and novel class I and class II transposable elements , 2004, Theoretical and Applied Genetics.

[18]  R. Guigó,et al.  Exon structure conservation despite low sequence similarity: a relic of dramatic events in evolution? , 2001, The EMBO journal.

[19]  Thomas Madej,et al.  Threading analysis suggests that the obese gene product may be a helical cytokine , 1995, FEBS letters.

[20]  Ian Korf,et al.  Integrating genomic homology into gene structure prediction , 2001, ISMB.

[21]  C. Rawlings,et al.  Identification and analysis of multigene families by comparison of exon fingerprints. , 1995, Journal of molecular biology.

[22]  M. Montané,et al.  Characterization of Arabidopsis thaliana ortholog of the human breast cancer susceptibility gene 1: AtBRCA1, strongly induced by gamma rays. , 2003, Nucleic acids research.

[23]  V. Solovyev,et al.  Predicting internal exons by oligonucleotide composition and discriminant analysis of spliceable open reading frames. , 1994, Nucleic acids research.

[24]  Melanie E. Goward,et al.  The DNA sequence of human chromosome 22 , 1999, Nature.

[25]  A. Inagaki,et al.  Positional cloning of the rice Rf-1 gene, a restorer of BT-type cytoplasmic male sterility that encodes a mitochondria-targeting PPR protein , 2004, Theoretical and Applied Genetics.

[26]  C. Mcwherter,et al.  Three-dimensional solution structure and backbone dynamics of a variant of human interleukin-3. , 1996, Journal of molecular biology.

[27]  K. Jarrell,et al.  The genome of BCJA1c: a bacteriophage active against the alkaliphilic bacterium, Bacillus clarkii , 2005, Extremophiles.

[28]  Darrell Conklin,et al.  Recognition of the Helical Cytokine Fold , 2004, J. Comput. Biol..

[29]  Roderic Guigó,et al.  Assembling Genes from Predicted Exons In Linear Time with Dynamic Programming , 1998, J. Comput. Biol..

[30]  M. Frasch,et al.  pyramus and thisbe: FGF genes that pattern the mesoderm of Drosophila embryos. , 2004, Genes & development.

[31]  Victor V. Solovyev,et al.  PromH: promoters identification using orthologous genomic sequences , 2003, Nucleic Acids Res..

[32]  Hong-Gyu Kang,et al.  Generation and Analysis of End Sequence Database for T-DNA Tagging Lines in Rice1 , 2003, Plant Physiology.

[33]  S. Casjens,et al.  The pKO2 Linear Plasmid Prophage of Klebsiella oxytoca , 2004, Journal of bacteriology.

[34]  A. Thakur,et al.  Oxygen-Controlled Bacterial Growth in the Sponge Suberites domuncula: toward a Molecular Understanding of the Symbiotic Relationships between Sponge and Bacteria , 2004, Applied and Environmental Microbiology.