Genome Compartimentation by a Hybrid Chromosome Model (HM). Application to Saccharomyces Cerevisae Subtelomeres

The aim of this paper is to present a new approach, called 'Hybrid Chromosome Model' (HXM), which allows both the extraction of regions of similarity between two sequences, and the compartimentation of a set of DNA sequences. The principle of the method consists in compacting a set of sequences (split into fragments of fixed length) into a 'hybrid chromosome', which results from the stacking of the whole sequence fragments. We have illustrated our approach on the 32 subtelomeres of Saccharomyces cerevisae. The compartimentation of these chromosome extremities into common regions of similarity has been carried out. The approach HXM is a fast and efficient tool for mapping entire genomes and for extracting ancient duplications within or between genomes.

[1]  B. Kingham,et al.  The genome of herpesvirus of turkeys: comparative analysis with Marek's disease viruses. , 2001, The Journal of general virology.

[2]  X. Gu,et al.  Evolutionary Patterns of Gene Families Generated in the Early Stage of Vertebrates , 2000, Journal of Molecular Evolution.

[3]  F. Denoeud,et al.  A tandem repeats database for bacterial genomes: application to the genotyping of Yersinia pestis and Bacillus anthracis , 2001, BMC Microbiology.

[4]  S. Salzberg,et al.  Alignment of whole genomes. , 1999, Nucleic acids research.

[5]  E. Zabarovsky,et al.  Comparative sequence analysis of a region on human chromosome 13q14, frequently deleted in B-cell chronic lymphocytic leukemia, and its homologous region on mouse chromosome 14. , 2000, Genomics.

[6]  P. Deschavanne,et al.  Genomic signature: characterization and classification of species assessed by chaos game representation of sequences. , 1999, Molecular biology and evolution.

[7]  C. DeLisi,et al.  Predictions of gene family distributions in microbial genomes: evolution by gene duplication and modification. , 2000, Physical review letters.

[8]  S Schwartz,et al.  Web-based visualization tools for bacterial genome alignments. , 2000, Nucleic acids research.

[9]  J. Claverie Computational methods for the identification of genes in vertebrate genomic sequences. , 1997, Human molecular genetics.

[10]  Ash A. Alizadeh,et al.  'Gene shaving' as a method for identifying distinct sets of genes with similar expression patterns , 2000, Genome Biology.

[11]  R. Durbin,et al.  Alfresco--a workbench for comparative genomic sequence analysis. , 2000, Genome research.

[12]  Serge A. Hazout,et al.  A strategy for finding regions of similarity in complete genome sequences , 1998, Bioinform..

[13]  R. Britten Precise sequence complementarity between yeast chromosome ends and two classes of just-subtelomeric sequences. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[14]  G. McVean,et al.  Genome sequences and evolutionary biology, a two-way interaction. , 2001, Trends in ecology & evolution.

[15]  Jérôme Gouzy,et al.  Whole Genome Protein Domain Analysis using a New Method for Domain Clustering , 1999, Comput. Chem..

[16]  S. Hazout,et al.  Compacting local protein folds with a “hybrid protein model” , 2001 .

[17]  Serge A. Hazout,et al.  MOSAIC: segmenting multiple aligned DNA sequences , 2001, Bioinform..

[18]  Serge A. Hazout,et al.  Hybrid Protein Model (HPM): a method to compact protein 3D-structure information and physicochemical properties , 2000, Proceedings Seventh International Symposium on String Processing and Information Retrieval. SPIRE 2000.

[19]  H. Mori,et al.  Evolutionary instability of operon structures disclosed by sequence comparisons of complete microbial genomes. , 1999, Molecular biology and evolution.

[20]  R. Gibbs,et al.  Comparative sequence analysis of a gene-rich cluster at human chromosome 12p13 and its syntenic region in mouse chromosome 6. , 1998, Genome research.

[21]  Teuvo Kohonen,et al.  An introduction to neural computing , 1988, Neural Networks.

[22]  W. F. Cooper,et al.  An Inexpensive Personal Computer Based Photon Counter , 1991, Comput. Chem..

[23]  Gregory Kucherov,et al.  Finding repeats with fixed gap , 2000, Proceedings Seventh International Symposium on String Processing and Information Retrieval. SPIRE 2000.

[24]  A. Force,et al.  Preservation of duplicate genes by complementary, degenerative mutations. , 1999, Genetics.

[25]  Burkhard Morgenstern,et al.  A space-efficient algorithm for aligning large genomic sequences , 2000, Bioinform..

[26]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[27]  A. Force,et al.  The probability of duplicate gene preservation by subfunctionalization. , 2000, Genetics.

[28]  W. Miller,et al.  Long human-mouse sequence alignments reveal novel regulatory elements: a reason to sequence the mouse genome. , 1997, Genome research.

[29]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[30]  S. Salzberg,et al.  Evidence for symmetric chromosomal inversions around the replication origin in bacteria , 2000, Genome Biology.

[31]  A Danchin,et al.  Analysis of long repeats in bacterial genomes reveals alternative evolutionary mechanisms in Bacillus subtilis and other competent prokaryotes. , 1999, Molecular biology and evolution.

[32]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[33]  Maxime Crochemore,et al.  Zones of Low Entropy in Genomic Sequences , 1999, Comput. Chem..

[34]  Eugene W. Myers,et al.  Identifying Satellites and Periodic Repetitions in Biological Sequences , 1998, J. Comput. Biol..

[35]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[36]  R. Gibbs,et al.  PipMaker--a web server for aligning two genomic DNA sequences. , 2000, Genome research.

[37]  D. Baillie,et al.  WABA success: a tool for sequence comparison between large genomes. , 2000, Genome research.