The DNA sequence and comparative analysis of human chromosome 20

The finished sequence of human chromosome 20 comprises 59,187,298 base pairs (bp) and represents 99.4% of the euchromatic DNA. A single contig of 26 megabases (Mb) spans the entire short arm, and five contigs separated by gaps totalling 320 kb span the long arm of this metacentric chromosome. An additional 234,339 bp of sequence has been determined within the pericentromeric region of the long arm. We annotated 727 genes and 168 pseudogenes in the sequence. About 64% of these genes have a 5′ and a 3′ untranslated region and a complete open reading frame. Comparative analysis of the sequence of chromosome 20 to whole-genome shotgun-sequence data of two other vertebrates, the mouse Mus musculus and the puffer fish Tetraodon nigroviridis, provides an independent measure of the efficiency of gene annotation, and indicates that this analysis may account for more than 95% of all coding exons and almost all genes.

D R Bentley | J E Sulston | C Carder | T Hubbard | S Beck | G Stavrides | L Williams | P. Deloukas | R. Durbin | A. Coulson | A. Brown | J. Sulston | N. Carter | J. Bailey | M. Kay | C. Steward | T. Hubbard | L. Matthews | S. Whitehead | M. Dunn | P. Heath | D. Grafham | C. Soderlund | A. Butler | C. Clee | S. Hunt | A. Peck | D. Bentley | J. Mullikin | D. Willey | C. Rice | J. Rogers | R. Gwilliam | P. Whittaker | M. Ross | C. Bird | J. Burton | M. Griffiths | K. McLay | R. Plumb | S. Sims | K. Barlow | A. McMurray | M. Clamp | P. Dhami | Ruth Taylor | J. Gilbert | Sancha Martin | D. Beare | S. Williams | S. Beck | R. Deadman | L. French | A. Hunt | C. Lloyd | S. Milne | R. Shownkeen | J. Wallis | R. Connor | A. Fraser | E. Huckle | K. Oliver | R. Taylor | B. Phillimore | Anne Parker | L. Wilming | C. Scott | M. Wall | M. Smith | M. Leversha | C. Johnson | J. Ashurst | A. Hunt | J. Almeida | A. Babbage | C. Bagguley | K. Bates | O. Beasley | S. Blakey | A. M. Bridgeman | D. Buck | W. Burrill | C. Carder | G. Clark | S. Clegg | V. Cobley | R. Collier | N. Corby | G. Coville | A. Ellington | R. Hall | S. Ho | A. Kimberley | A. King | G. Laird | D. Lloyd | H. Ramsay | H. Sehra | C. Skuce | M. Vaudin | D. Willey | L. Williams | D. Johnson | K. Jekosch | V. L. Marsh | A. Knights | P. Wray | A. Tracey | J. Chapman | J. Frankland | P. Garner | C. Griffiths | S. Hammond | J. Harley | P. J. Howden | S. Lawlor | J. Lovell | S. Martin | M. Moore | T. Nickerson | A. Parker | R. Patel | N. Sycamore | A. Thorpe | A. Tromans | L. Tee | G. Stavrides | M Leversha | N P Carter | S A Williams | A Coulson | C M Rice | A. R. Hunt | C. L. Bagguley | A J Brown | P Deloukas | L H Matthews | J Ashurst | J Burton | J G Gilbert | M Jones | J P Almeida | A K Babbage | C L Bagguley | J Bailey | K F Barlow | K N Bates | L M Beard | D M Beare | O P Beasley | C P Bird | S E Blakey | A M Bridgeman | D Buck | W Burrill | A P Butler | J C Chapman | M Clamp | G Clark | L N Clark | S Y Clark | C M Clee | S Clegg | V E Cobley | R E Collier | R Connor | N R Corby | G J Coville | R Deadman | P Dhami | M Dunn | A G Ellington | J A Frankland | A Fraser | L French | P Garner | D V Grafham | C Griffiths | M N Griffiths | R Gwilliam | R E Hall | S Hammond | J L Harley | P D Heath | S Ho | J L Holden | P J Howden | E Huckle | A R Hunt | S E Hunt | K Jekosch | C M Johnson | D Johnson | M P Kay | A M Kimberley | A King | A Knights | G K Laird | S Lawlor | M H Lehvaslaiho | C Lloyd | D M Lloyd | J D Lovell | V L Marsh | S L Martin | L J McConnachie | K McLay | A A McMurray | S Milne | D Mistry | M J Moore | J C Mullikin | T Nickerson | K Oliver | A Parker | R Patel | T A Pearce | A I Peck | B J Phillimore | S R Prathalingam | R W Plumb | H Ramsay | M T Ross | C E Scott | H K Sehra | R Shownkeen | S Sims | C D Skuce | M L Smith | C Soderlund | C A Steward | M Swann | N Sycamore | R Taylor | L Tee | D W Thomas | A Thorpe | A Tracey | A C Tromans | M Vaudin | M Wall | J M Wallis | S L Whitehead | P Whittaker | D L Willey | L Wilming | P W Wray | R M Durbin | J Rogers | L. Clark | M. Lehvaslaiho | T. Pearce | L. M. Beard | M. Jones | L. McConnachie | S. Prathalingam | J. L. Holden | David C. Johnson | Stephan Beck | S. Clark | M. Swann | D. Mistry | D. Thomas | M. Jones | M. Jones | Adam P. Butler | P. Howden | T. Hubbard | D. R. Bentley | N. P. Carter | David Buck | Graeme T Clark | Louise Clark | Carol Soderlund

[1]  G. Bouffard,et al.  Mutation of a gene encoding a putative chaperonin causes McKusick-Kaufman syndrome , 2000, Nature Genetics.

[2]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[3]  J. D. Parsons,et al.  Miropeats: graphical DNA sequence comparisons , 1995, Comput. Appl. Biosci..

[4]  D. Barton,et al.  Chromosome mapping of the growth hormone receptor gene in man and mouse. , 1989, Cytogenetics and cell genetics.

[5]  P. Deloukas,et al.  Three human glutamate dehydrogenase genes (GLUD1, GLUDP2, and GLUDP3) are located on chromosome 10q, but are not closely physically linked. , 1993, Genomics.

[6]  T. Hubbard,et al.  Computational detection and location of transcription start sites in mammalian genomic DNA. , 2002, Genome research.

[7]  J E Sulston,et al.  Short-insert libraries as a method of problem solving in genome sequencing. , 1998, Genome research.

[8]  J W Gray,et al.  Comprehensive genome sequence analysis of a breast cancer amplicon. , 2001, Genome research.

[9]  M. Dobson,et al.  The telomere-associated DNA from human chromosome 20p contains a pseudotelomere structure and shares sequences with the subtelomeric regions of 4q and 18p. , 1997, Genomics.

[10]  Richard Mott,et al.  EST_GENOME: a program to align spliced DNA sequences to unspliced genomic DNA , 1997, Comput. Appl. Biosci..

[11]  P Green,et al.  Base-calling of automated sequencer traces using phred. II. Error probabilities. , 1998, Genome research.

[12]  P. Deloukas,et al.  The physical maps for sequencing human chromosomes 1, 6, 9, 10, 13, 20 and X , 2001, Nature.

[13]  J. Bonfield,et al.  A new DNA sequence assembly program. , 1995, Nucleic acids research.

[14]  R. Siebert,et al.  Mutations in the LGI1/Epitempin gene on 10q24 cause autosomal dominant lateral temporal epilepsy. , 2002, Human molecular genetics.

[15]  Colin N. Dewey,et al.  Initial sequencing and comparative analysis of the mouse genome. , 2002 .

[16]  J. Yunis,et al.  The origin of man: a chromosomal pictorial legacy. , 1982, Science.

[17]  Daniel Pinkel,et al.  Large-scale variation among human and great ape genomes determined by array comparative genomic hybridization. , 2003, Genome research.

[18]  F. Sanger,et al.  DNA sequencing with chain-terminating inhibitors. , 1977, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Y. Fukushima,et al.  Haploinsufficiency of NSD1 causes Sotos syndrome , 2002, Nature Genetics.

[20]  Charis Eng,et al.  PTEN: One Gene, Many Syndromes , 2003, Human mutation.

[21]  D. Bonthron,et al.  An imprinted antisense transcript at the human GNAS1 locus. , 2000, Human molecular genetics.

[22]  P. Deloukas,et al.  A detailed physical and transcriptional map of the region of chromosome 20 that is deleted in myeloproliferative disorders and refinement of the common deleted region. , 1998, Genomics.

[23]  J. R. MacDonald,et al.  Genome-wide detection of segmental duplications and potential assembly errors in the human genome sequence , 2003, Genome Biology.

[24]  D. Mccormick Sequence the Human Genome , 1986, Bio/Technology.

[25]  D. Gudbjartsson,et al.  A high-resolution recombination map of the human genome , 2002, Nature Genetics.

[26]  Hans H. Cheng,et al.  A consensus linkage map of the chicken genome. , 2000, Genome research.

[27]  A. Bird,et al.  Number of CpG islands and genes in human and mouse. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[28]  S. Karlin,et al.  Prediction of complete gene structures in human genomic DNA. , 1997, Journal of molecular biology.

[29]  Nancy F. Hansen,et al.  Comparative analyses of multi-species sequences from targeted genomic regions , 2003, Nature.

[30]  Gregory D Schuler,et al.  Sequence mapping by electronic PCR , 1997, Genome research.

[31]  D. Church,et al.  A high-resolution physical and transcript map of the Cri du chat region of human chromosome 5p. , 1997, Genome research.

[32]  C. Heiner,et al.  New dye-labeled terminators for improved DNA sequencing patterns. , 1997, Nucleic acids research.

[33]  P. Deloukas,et al.  Comparison of human genetic and sequence-based physical maps , 2001, Nature.

[34]  M. Hattori,et al.  The DNA sequence of human chromosome 21 , 2000, Nature.

[35]  P. Lijnzaad,et al.  A physical map of 30,000 human genes. , 1998, Science.

[36]  I-Min A. Dubchak,et al.  Active conservation of noncoding sequences revealed by three-way species comparisons. , 2000, Genome research.

[37]  R. Simons,et al.  Antisense RNA control in bacteria, phages, and plasmids. , 1994, Annual review of microbiology.

[38]  D. Bonneau,et al.  Mutations of the human PTEN gene , 2000, Human mutation.

[39]  J. Ikeda,et al.  Sequence of a 131-kb region of 5q13.1 containing the spinal muscular atrophy candidate genes SMN and NAIP. , 1998, Genomics.

[40]  K. Clément,et al.  GAD2 on Chromosome 10p12 Is a Candidate Gene for Human Obesity , 2003, PLoS biology.

[41]  Chuong B. Do,et al.  Access the most recent version at doi: 10.1101/gr.926603 References , 2003 .

[42]  J C Murray,et al.  Pediatrics and , 1998 .

[43]  N E Morton,et al.  Parameters of the human genome. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[44]  M. Noor,et al.  Chromosomal inversions and the reproductive isolation of species , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[45]  Yoshihide Hayashizaki,et al.  Antisense transcripts with FANTOM2 clone set and their implications for gene regulation. , 2003, Genome research.

[46]  C. Wijmenga,et al.  The DNMT3B DNA methyltransferase gene is mutated in the ICF immunodeficiency syndrome. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[47]  A. Scarpa,et al.  High resolution allelotype of nonfunctional pancreatic endocrine tumors: identification of two molecular subgroups with clinical implications. , 2001, Cancer research.

[48]  J. Trent,et al.  Evidence for a prostate cancer linkage to chromosome 20 in 159 hereditary prostate cancer families , 2001, Human Genetics.

[49]  K. Frazer,et al.  Computational and biological analysis of 680 kb of DNA sequence from the human 5q31 cytokine gene cluster region. , 1997, Genome research.

[50]  B. Trask,et al.  A High-Resolution Radiation Hybrid Map of the Human Genome Draft Sequence , 2001, Science.

[51]  P. Green,et al.  Base-calling of automated sequencer traces using phred. I. Accuracy assessment. , 1998, Genome research.

[52]  Colin C. Collins,et al.  Alagille syndrome is caused by mutations in human Jagged1, which encodes a ligand for Notch1 , 1997, Nature Genetics.

[53]  M. Adams,et al.  Recent Segmental Duplications in the Human Genome , 2002, Science.

[54]  Eric S. Lander,et al.  The diastrophic dysplasia gene encodes a novel sulfate transporter: Positional cloning by fine-structure linkage disequilibrium mapping , 1994, Cell.

[55]  K. Frazer,et al.  Functional screening of an asthma QTL in YAC transgenic mice , 1999, Nature Genetics.

[56]  Jonathan M. Mudge,et al.  Genomic sequence and transcriptional profile of the boundary between pericentromeric satellites and genes on human chromosome arm 10p. , 2003, Genome research.

[57]  M. Guyer,et al.  Assessing the quality of the DNA sequence from the Human Genome Project. , 1999, Genome research.

[58]  A. Billault,et al.  Genetic and physical analyses of the centromeric and pericentromeric regions of human chromosome 5: recombination across 5cen. , 1999, Genomics.

[59]  Sue Povey,et al.  Genew: the Human Gene Nomenclature Database , 2002, Nucleic Acids Res..

[60]  Mark Gerstein,et al.  Millions of years of evolution preserved: a comprehensive catalog of the processed pseudogenes in the human genome. , 2003, Genome research.

[61]  C. G. See,et al.  A 9.75-Mb map across the centromere of human chromosome 10. , 1996, Genomics.

[62]  J. Cheville,et al.  Transcriptional silencing of zinc finger protein 185 identified by expression profiling is associated with prostate cancer progression. , 2003, Cancer research.

[63]  M. Wigler,et al.  PTEN, a Putative Protein Tyrosine Phosphatase Gene Mutated in Human Brain, Breast, and Prostate Cancer , 1997, Science.

[64]  C. Bult,et al.  Functional annotation of a full-length mouse cDNA collection , 2001, Nature.

[65]  D. Haussler,et al.  Human-mouse alignments with BLASTZ. , 2003, Genome research.

[66]  N. Archidiacono,et al.  Human paralogs of KIAA0187 were created through independent pericentromeric-directed and chromosome-specific duplication mechanisms. , 2002, Genome research.

[67]  T. Liesegang The physical maps for sequencing human chromosomes 1,6,9,10,13,20 and X. Bentley DR,∗ Deloukas P, Dunham A, et al. Nature 2001;409:942–943. , 2001 .

[68]  M. Schartl,et al.  300 million years of conserved synteny between chicken Z and human chromosome 9 , 1999, Nature Genetics.

[69]  R. Moyzis,et al.  Integration of telomere sequences with the draft human genome sequence , 2001, Nature.

[70]  W. Miller,et al.  Identification of a coordinate regulator of interleukins 4, 13, and 5 by cross-species sequence comparisons. , 2000, Science.

[71]  M. Daly,et al.  A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms , 2001, Nature.

[72]  Toshihiro Tanaka The International HapMap Project , 2003, Nature.

[73]  R. Gibbs,et al.  PipMaker--a web server for aligning two genomic DNA sequences. , 2000, Genome research.

[74]  M. Hattori,et al.  The DNA sequence of human chromosome 21 The chromosome 21 mapping and sequencing consortium , 2000 .

[75]  M. Adams,et al.  Inferring Nonneutral Evolution from Human-Chimp-Mouse Orthologous Gene Trios , 2003, Science.

[76]  Melanie E. Goward,et al.  The DNA sequence of human chromosome 22 , 1999, Nature.

[77]  L. Pennacchio,et al.  Genomic strategies to identify mammalian regulatory sequences , 2001, Nature Reviews Genetics.

[78]  David Neil Cooper,et al.  Encyclopedia of the Human Genome , 2003 .

[79]  D. Le Paslier,et al.  De novo and inherited deletions of the 5q13 region in spinal muscular atrophies. , 1994, Science.

[80]  B. Trask,et al.  Segmental duplications: organization and impact within the current human genome project assembly. , 2001, Genome research.

[81]  C. Fizames,et al.  Estimate of human gene number provided by genome-wide analysis using Tetraodon nigroviridis DNA sequence , 2000, Nature Genetics.

[82]  J. Jurka Repbase update: a database and an electronic journal of repetitive elements. , 2000, Trends in genetics : TIG.

[83]  T. L. Hood,et al.  Chromosome 20 deletions in myeloid malignancies: reduction of the common deleted region, generation of a PAC/BAC contig and identification of candidate genes , 2000, Oncogene.

[84]  P. Green,et al.  Consed: a graphical tool for sequence finishing. , 1998, Genome research.

[85]  S. Eddy Non–coding RNA genes and the modern RNA world , 2001, Nature Reviews Genetics.

[86]  M. Eisen,et al.  Why PLoS Became a Publisher , 2003, PLoS biology.

[87]  J. Sulston,et al.  Genomic sequence and transcriptional profile of the boundary between pericentromeric satellites and genes on human chromosome arm 10q. , 2000, Human molecular genetics.

[88]  Webb Miller,et al.  Comparative genome analysis delimits a chromosomal domain and identifies key regulatory elements in the α globin cluster , 2001 .

[89]  J. Mullikin,et al.  SSAHA: a fast search method for large DNA databases. , 2001, Genome research.

[90]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[91]  V. Solovyev,et al.  Ab initio gene finding in Drosophila genomic DNA. , 2000, Genome research.

[92]  P. Deloukas,et al.  Report of the third international workshop on human chromosome 10 mapping and sequencing 1999 , 2000, Cytogenetic and Genome Research.

[93]  D J Schaid,et al.  Evidence for a prostate cancer-susceptibility locus on chromosome 20. , 2000, American journal of human genetics.

[94]  S. Lindsay,et al.  A novel family of cathepsin L-like (CTSLL) sequences on human chromosome 10q and related transcripts. , 1994, Genomics.

[95]  Jan-Fang Cheng,et al.  Fifty microdeletions among 112 cases of Sotos syndrome: Low copy repeats possibly mediate the common deletion , 2003, Human mutation.

[96]  J. McPherson,et al.  A single nucleotide difference that alters splicing patterns distinguishes the SMA gene SMN1 from the copy gene SMN2. , 1999, Human molecular genetics.

[97]  L. Hocking,et al.  Domain-specific mutations in sequestosome 1 (SQSTM1) cause familial and sporadic Paget's disease. , 2002, Human molecular genetics.

[98]  Shawn K. Westaway,et al.  A novel pantothenate kinase gene (PANK2) is defective in Hallervorden-Spatz syndrome , 2001, Nature Genetics.

[99]  R. Wilson,et al.  The AZFc region of the Y chromosome features massive palindromes and uniform recurrent deletions in infertile men , 2001, Nature Genetics.

[100]  M. Nóbrega,et al.  Scanning Human Gene Deserts for Long-Range Enhancers , 2003, Science.

[101]  P Bork,et al.  EST comparison indicates 38% of human mRNAs contain possible alternative splice forms , 2000, FEBS letters.

[102]  Angel Amores,et al.  Regulatory roles of conserved intergenic domains in vertebrate Dlx bigene clusters. , 2003, Genome research.