Structural analysis of Arabidopsis thaliana chromosome 5. I. Sequence features of the 1.6 Mb regions covered by twenty physically assigned P1 clones.

A total of 20 P1 clones with an average insert size of 80 kb and each containing a marker(s) specifically mapped on chromosome 5 were isolated from a P1 library of the Arabidopsis thaliana genome, and their nucleotide sequences were determined according to a shotgun-based strategy and precisely located on the physical map of chromosome 5 separately constructed. The total length of the sequenced regions were summed up to 1,621,245 bp. By comparison with the sequences in protein and EST databases and analysis with computer programs for gene modeling, a total of 347 potential protein-coding genes and/or gene segments with known or predicted functions were identified. The positions of exons which do not exhibit any similarity to known genes were also predicted. An average density of the genes and/or gene segments assigned so far as 1 gene/4,672 bp. Introns were identified in approximately 78% of the potential genes, and the average number and length of the introns per gene were 3.7 and 161 bp. The transcription level of the predicted genes was roughly monitored by counting the numbers of identified Arabidopsis ESTs. The sequence data and gene information are available through the World Wide Web at http:/(/)www.kazusa.or.jp/arabi/.

[1]  S. Oliver,et al.  Erratum: Overview of the yeast genome , 1997, Nature.

[2]  C. Dean,et al.  Description of 31 YAC contigs spanning the majority of Arabidopsis thaliana chromosome 5. , 1997, The Plant journal : for cell and molecular biology.

[3]  S. Eddy,et al.  tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. , 1997, Nucleic acids research.

[4]  J. Kaiser First Global Sequencing Effort Begins , 1996, Science.

[5]  C. Lister,et al.  Isolation and mapping of a new set of 129 RFLP markers in Arabidopsis thaliana using recombinant inbred lines. , 1996, The Plant journal : for cell and molecular biology.

[6]  Peter G. Korning,et al.  Splice site prediction in Arabidopsis thaliana pre-mRNA by combining local and global sequence information. , 1996, Nucleic acids research.

[7]  R. Fleischmann,et al.  Complete Genome Sequence of the Methanogenic Archaeon, Methanococcus jannaschii , 1996, Science.

[8]  Sayaka,et al.  Sequence analysis of the genome of the unicellular cyanobacterium Synechocystis sp. strain PCC6803. II. Sequence determination of the entire genome and assignment of potential protein-coding regions. , 1996, DNA research : an international journal for rapid publication of reports on genes and genomes.

[9]  Y. Nakamura,et al.  Sequence analysis of the genome of the unicellular cyanobacterium Synechocystis sp. strain PCC6803. II. Sequence determination of the entire genome and assignment of potential protein-coding regions (supplement). , 1996, DNA research : an international journal for rapid publication of reports on genes and genomes.

[10]  F Quigley,et al.  Further progress towards a catalogue of all Arabidopsis genes: analysis of a set of 5000 non-redundant ESTs. , 1996, The Plant journal : for cell and molecular biology.

[11]  H. Goodman,et al.  A physical map of chromosome 2 of Arabidopsis thaliana. , 1996, Genome research.

[12]  D. Le Paslier,et al.  The CIC library: a large insert YAC library for genome mapping in Arabidopsis thaliana. , 1995, The Plant journal : for cell and molecular biology.

[13]  R. Fleischmann,et al.  The Minimal Gene Complement of Mycoplasma genitalium , 1995, Science.

[14]  R. Fleischmann,et al.  Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. , 1995, Science.

[15]  N. Mitsukawa,et al.  Generation of a high-quality P1 library of Arabidopsis suitable for chromosome walking , 1995 .

[16]  N. Miyajima,et al.  Sequence analysis of the genome of the unicellular cyanobacterium Synechocystis sp. strain PCC6803. I. Sequence features in the 1 Mb region from map positions 64% to 92% of the genome. , 1995, DNA research : an international journal for rapid publication of reports on genes and genomes.

[17]  Frans,et al.  Genes Galore: A Summary of Methods for Accessing Results from Large-Scale Partial Sequencing of Anonymous Arabidopsis cDNA Clones , 1994, Plant physiology.

[18]  V. Solovyev,et al.  Predicting internal exons by oligonucleotide composition and discriminant analysis of spliceable open reading frames. , 1994, Nucleic acids research.

[19]  E. Uberbacher,et al.  Locating protein-coding regions in human DNA sequences by a multiple sensor-neural network approach. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[20]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.