Human and mouse gene structure: comparative analysis and application to exon prediction

We describe a novel analytical approach to gene recognition based on cross-species comparison We first undertook a comparison of orthologous genomic look from human and mouse, studying the extent of similarity in the number, size and sequence of exons and introns We then developed an approach for recognizing genes within such orthologous regions, by first aligning the regions using an iterative global alignment system and then identifying genes based on conservation of exonic features at aligned positions in both species The alignment and gene recognition are performed by new programs called GLASS and ROSETTA, respectively ROSETTA performed well at exact identification of coding exons in 117 orthologous pairs tested.

[1]  J. R. Fresco,et al.  Nucleotide Sequence , 2020, Definitions.

[2]  M. O. Dayhoff A model of evolutionary change in protein , 1978 .

[3]  M. O. Dayhoff,et al.  22 A Model of Evolutionary Change in Proteins , 1978 .

[4]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[5]  S. Nagata,et al.  The chromosomal gene structure and two mRNAs for human granulocyte colony‐stimulating factor. , 1986, The EMBO journal.

[6]  V. Rosen,et al.  Isolation of the human gene for bone gla protein utilizing mouse and rat cDNA clones. , 1986, The EMBO journal.

[7]  R. Dixon,et al.  Sequence of the gene encoding the human M1 muscarinic acetylcholine receptor. , 1987, Nucleic acids research.

[8]  C. Hutchison,et al.  Nucleotide sequence of the BALB/c mouse β-globin complex , 1989 .

[9]  S. B. Needleman,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 1989 .

[10]  N. Leslie,et al.  The human galactose-1-phosphate uridyltransferase gene. , 1992, Genomics.

[11]  K. Yoneda,et al.  The human loricrin gene. , 1992, The Journal of biological chemistry.

[12]  A. Bird,et al.  Number of CpG islands and genes in human and mouse. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[13]  L. Hood,et al.  Striking sequence similarity over almost 100 kilobases of human and mouse T–cell receptor DNA , 1994, Nature Genetics.

[14]  A. Carrano,et al.  Genomic sequence comparison of the human and mouse XRCC1 DNA repair gene regions. , 1995, Genomics.

[15]  R. Guigó,et al.  Evaluation of gene structure prediction programs. , 1996, Genomics.

[16]  P. Pevzner,et al.  Gene recognition via spliced sequence alignment. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[17]  M. Boguski,et al.  Comparative analysis of 1196 orthologous mouse and human full-length mRNA and protein sequences. , 1996, Genome research.

[18]  A. Kumar,et al.  Structural organization and chromosomal mapping of JAK3 locus. , 1996, Oncogene.

[19]  S. Karlin,et al.  Prediction of complete gene structures in human genomic DNA. , 1997, Journal of molecular biology.

[20]  R. Gibbs,et al.  Large-scale comparative sequence analysis of the human and murine Bruton's tyrosine kinase loci reveals conserved regulatory domains. , 1997, Genome research.

[21]  W. Miller,et al.  Long human-mouse sequence alignments reveal novel regulatory elements: a reason to sequence the mouse genome. , 1997, Genome research.

[22]  A. Harris Genes VI , 1997 .

[23]  Thomas L. Madden,et al.  PowerBLAST: a new network BLAST application for interactive or automated sequence analysis and annotation. , 1997, Genome research.

[24]  M. Boguski,et al.  Evolutionary parameters of the transcribed mammalian genome: an analysis of 2,820 orthologous rodent and human sequences. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[25]  M. Boguski,et al.  Synonymous and Nonsynonymous Substitution Distances Are Correlated in Mouse and Rat Genes , 1998, Journal of Molecular Evolution.

[26]  Paul Levi,et al.  GENIO/scan - EST Guided Identification of Genes in Human Genomic DNA , 1998, German Conference on Bioinformatics.

[27]  M. Borodovsky,et al.  GeneMark.hmm: new solutions for gene finding. , 1998, Nucleic acids research.

[28]  B. Berger,et al.  Domino tiling, gene recognition, and mice , 1999 .

[29]  Mark E. Dalphin,et al.  TransTerm, the translational signal database, extended to include full coding sequences and untranslated regions , 1999, Nucleic Acids Res..

[30]  W Miller,et al.  Comparative sequence of human and mouse BAC clones from the mnd2 region of chromosome 2p13. , 1999, Genome research.

[31]  Valentin I. Spitkovsky,et al.  A dictionary based approach for gene annotation , 1999, J. Comput. Biol..