Diploid genome reconstruction of Ciona intestinalis and comparative analysis with Ciona savignyi.

One of the main goals in genome sequencing projects is to determine a haploid consensus sequence even when clone libraries are constructed from homologous chromosomes. However, it has been noticed that haplotypes can be inferred from genome assemblies by investigating phase conservation in sequenced reads. In this study, we seek to infer haplotypes, a diploid consensus sequence, from the genome assembly of an organism, Ciona intestinalis. The Ciona intestinalis genome is an ideal resource from which haplotypes can be inferred because of the high polymorphism rate (1.2%). The haplotype estimation scheme consists of polymorphism detection and phase estimation. The core step of our method is a Gibbs sampling procedure. The mate-pair information from two-end sequenced clone inserts is exploited to provide long-range continuity. We estimate the polymorphism rate of Ciona intestinalis to be 1.2% and 1.5%, according to two different polymorphism counting schemes. The distribution of heterozygosity number is well fit by a compound Poisson distribution. The N50 length of haplotype segments is 37.9 kb in our assembly, while the N50 scaffold length of the Ciona intestinalis assembly is 190 kb. We also infer diploid gene sequences from haplotype segments. According to our reconstruction, 85.4% of predicted gene sequences are continuously covered by single haplotype segments. Our results indicate 97% accuracy in haplotype estimation, based on a simulated data set. We conduct a comparative analysis with Ciona savignyi, and discover interesting patterns of conserved DNA elements in chordates.

[1]  M. Nordborg,et al.  Coalescent Theory , 2019, Handbook of Statistical Genomics.

[2]  J. Huelsenbeck,et al.  Effect of nonindependent substitution on phylogenetic accuracy. , 1999, Systematic biology.

[3]  L Peltonen,et al.  Minisequencing: a specific tool for DNA analysis and diagnostics on oligonucleotide arrays. , 1997, Genome research.

[4]  Shibu Yooseph,et al.  Haplotyping as Perfect Phylogeny: A Direct Approach , 2003, J. Comput. Biol..

[5]  Paramvir S. Dehal,et al.  Whole-Genome Shotgun Assembly and Analysis of the Genome of Fugu rubripes , 2002, Science.

[6]  Christopher D. Brown,et al.  Noncoding regulatory sequences of Ciona exhibit strong correspondence between evolutionary constraint and functional importance. , 2004, Genome research.

[7]  D. Mccormick Sequence the Human Genome , 1986, Bio/Technology.

[8]  C. Nusbaum,et al.  Large-scale identification, mapping, and genotyping of single-nucleotide polymorphisms in the human genome. , 1998, Science.

[9]  Jong Hyun Kim,et al.  Haplotype Reconstruction from SNP Alignment , 2004, J. Comput. Biol..

[10]  Randall A. Bolanos,et al.  Whole-genome shotgun assembly and comparison of human genome assemblies , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Edward M. Rubin,et al.  Megabase deletions of gene deserts result in viable mice , 2004, Nature.

[12]  C. V. Jongeneel,et al.  Numerous potentially functional but non-genic conserved sequences on human chromosome 21 , 2002, Nature.

[13]  E. Mauceli,et al.  Whole-genome sequence assembly for mammalian genomes: Arachne 2. , 2003, Genome research.

[14]  Russell Schwartz,et al.  SNPs Problems, Complexity, and Algorithms , 2001, ESA.

[15]  Hui-Hsien Chou,et al.  DNA sequence quality trimming and vector removal , 2001, Bioinform..

[16]  Zhaohui S. Qin,et al.  Bayesian haplotype inference for multiple linked single-nucleotide polymorphisms. , 2002, American journal of human genetics.

[17]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[18]  A. Clark,et al.  Inference of haplotypes from PCR-amplified samples of diploid populations. , 1990, Molecular biology and evolution.

[19]  D. Haussler,et al.  Ultraconserved Elements in the Human Genome , 2004, Science.

[20]  M. Waterman,et al.  The accuracy of DNA sequences: estimating sequence quality. , 1992, Genomics.

[21]  Eran Halperin,et al.  Haplotype reconstruction from genotype data using Imperfect Phylogeny , 2004, Bioinform..

[22]  P. Donnelly,et al.  A new statistical method for haplotype reconstruction from population data. , 2001, American journal of human genetics.

[23]  P Green,et al.  Base-calling of automated sequencer traces using phred. II. Error probabilities. , 1998, Genome research.

[24]  M. Daly,et al.  High-resolution haplotype structure in the human genome , 2001, Nature Genetics.

[25]  D. Haussler,et al.  Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. , 2005, Genome research.

[26]  Jean L. Chang,et al.  Initial sequence of the chimpanzee genome and comparison with the human genome , 2005, Nature.

[27]  R S Judson,et al.  Complex promoter and coding region beta 2-adrenergic receptor haplotypes alter receptor expression and predict in vivo responsiveness. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[28]  B. Berger,et al.  ARACHNE: a whole-genome shotgun assembler. , 2002, Genome research.

[29]  S. Salzberg,et al.  Versatile and open software for comparing large genomes , 2004, Genome Biology.

[30]  L. Excoffier,et al.  Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. , 1995, Molecular biology and evolution.

[31]  Michael S. Waterman,et al.  Accuracy Assessment of Diploid Consensus Sequences , 2007, TCBB.

[32]  P. Green,et al.  Base-calling of automated sequencer traces using phred. I. Accuracy assessment. , 1998, Genome research.

[33]  O. White,et al.  Environmental Genome Shotgun Sequencing of the Sargasso Sea , 2004, Science.

[34]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[35]  Jill P Mesirov,et al.  Assembly of polymorphic genomes: algorithms and application to Ciona savignyi. , 2005, Genome research.

[36]  P. Kwok,et al.  Overlapping genomic sequences: a treasure trove of single-nucleotide polymorphisms. , 1998, Genome research.

[37]  K. Livak,et al.  A homogeneous, ligase-mediated DNA diagnostic test. , 1998, Genome research.

[38]  Eric S. Lander,et al.  An SNP map of the human genome generated by reduced representation shotgun sequencing , 2000, Nature.

[39]  Paul Richardson,et al.  The Draft Genome of Ciona intestinalis: Insights into Chordate and Vertebrate Origins , 2002, Science.

[40]  K. Kidd,et al.  HAPLO: a program using the EM algorithm to estimate the frequencies of multi-site haplotypes. , 1995, The Journal of heredity.

[41]  George Newport,et al.  The diploid genome sequence of Candida albicans. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[42]  J. Long,et al.  An E-M algorithm and testing strategy for multiple-locus haplotypes. , 1995, American journal of human genetics.

[43]  Jun S. Liu,et al.  Monte Carlo strategies in scientific computing , 2001 .