Minimal-dot plot: "Old tale in new skin" about sequence comparison

The authors propose a simple version of the dot-plot scheme to be used in the case when the distances between sequence elements may take more than two values. The method is applicable, in particular, to the case of the sequences of large-length windows when the sets of distance values are continuous. The proposed technique is simple to implement and the results can produce readable maps for further analysis. To illustrate its potentialities, the method has been applied to the comparison of genomic sequences. The asymmetry in the number of direct and reverse tracks for the Homo sapience genome has been discovered.

[1]  E. Nevo,et al.  A Large-Scale Comparison of Genomic Sequences: One Promising Approach , 2003, Acta Biotheoretica.

[2]  J. Maizel,et al.  Enhanced graphic matrix analysis of nucleic acid and protein sequences. , 1981, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Li Liao,et al.  Clustering exact matches of pairwise sequence alignments by weighted linear regression , 2007, BMC Bioinformatics.

[4]  David A. Bader,et al.  A Linear-Time Algorithm for Computing Inversion Distance between Signed Permutations with an Experimental Study , 2001, J. Comput. Biol..

[5]  J. M. Bevan,et al.  Rank Correlation Methods , 1949 .

[6]  D. Davison,et al.  A measure of DNA sequence dissimilarity based on Mahalanobis distance between frequencies of words. , 1997, Biometrics.

[7]  R. Durbin,et al.  A dot-matrix program with dynamic threshold control suited for genomic DNA and protein sequence analysis. , 1995, Gene.

[8]  Jonathan Lawry,et al.  Classification and query evaluation using modelling with words , 2006, Inf. Sci..

[9]  Donald R. Forsdyke,et al.  Purine loading, stem-loops and Chargaff’s second parity rule: a discussion of the application of elementary principles to early chemical observations , 2004, Applied bioinformatics.

[10]  E. Nevo,et al.  Different Clustering of Genomes Across Life Using the A-T-C-G and Degenerate R-Y Alphabets: Early and Late Signaling on Genome Evolution? , 2007, Journal of Molecular Evolution.

[11]  W. Fitch Locating gaps in amino acid sequences to optimize the homology between two proteins , 1969, Biochemical Genetics.

[12]  Alain Arneodo,et al.  Transcription-coupled and splicing-coupled strand asymmetries in eukaryotic genomes. , 2004, Nucleic acids research.

[13]  R. Durbin,et al.  Alfresco--a workbench for comparative genomic sequence analysis. , 2000, Genome research.

[14]  J. Mrázek Phylogenetic signals in DNA composition: limitations and prospects. , 2009, Molecular biology and evolution.

[15]  Brandon S Gaut,et al.  Plant conserved non-coding sequences and paralogue evolution. , 2005, Trends in genetics : TIG.

[16]  R. Gibbs,et al.  PipMaker--a web server for aligning two genomic DNA sequences. , 2000, Genome research.

[17]  Yuriy Fofanov,et al.  A computational tool for the genomic identification of regions of unusual compositional properties and its utilization in the detection of horizontally transferred sequences. , 2006, Molecular biology and evolution.

[18]  J. Beckmann,et al.  Linguistics of nucleotide sequences: morphology and comparison of vocabularies. , 1986, Journal of biomolecular structure & dynamics.

[19]  Peng Yin,et al.  Theoretical and practical advances in genome halving , 2005, Bioinform..

[20]  Zeev Volkovich,et al.  Genome Clustering - From Linguistic Models to Classification of Genetic Texts , 2010, Studies in Computational Intelligence.

[21]  S. Salzberg,et al.  Versatile and open software for comparing large genomes , 2004, Genome Biology.

[22]  S Karlin,et al.  Comparisons of eukaryotic genomic sequences. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[23]  A Danchin,et al.  Oligonucleotide bias in Bacillus subtilis: general trends and taxonomic comparisons. , 1998, Nucleic acids research.

[24]  E. Nevo,et al.  Compositional spectrum—revealing patterns for genomic sequence characterization and comparison , 2002 .

[25]  A. Gibbs,et al.  The Diagram, a Method for Comparing Sequences , 1970 .

[26]  Masumi Itoh,et al.  CGAS: comparative genomic analysis server , 2009, Bioinform..

[27]  Slawomir Zadrozny,et al.  Computing with words for text processing: An approach to the text categorization , 2006, Inf. Sci..

[28]  Xiang Fang,et al.  An improved string composition method for sequence comparison , 2008, BMC Bioinformatics.

[29]  Se-Ran Jun,et al.  Alignment-free genome comparison with feature frequency profiles (FFP) and optimal resolutions , 2009, Proceedings of the National Academy of Sciences.

[30]  Jill P. Mesirov,et al.  Combo: a whole genome comparative browser , 2006, Bioinform..

[31]  Valery Kirzhner,et al.  Large-scale genome clustering across life based on a linguistic approach. , 2005, Bio Systems.

[32]  Zeev Volkovich,et al.  The method of N-grams in large-scale clustering of DNA texts , 2005, Pattern Recognit..

[33]  Roderic Guigó,et al.  gff2aplot: Plotting sequence comparisons , 2003, Bioinform..

[34]  M. Kendall Rank Correlation Methods , 1949 .

[35]  S Schwartz,et al.  Comparative analysis of the gene-dense ACHE/TFR2 region on human chromosome 7q22 with the orthologous region on mouse chromosome 5. , 2001, Nucleic acids research.

[36]  G. Wagner,et al.  Proceedings of the SMBE Tri-National Young Investigators' Workshop 2005. What is the role of genome duplication in the evolution of complexity and diversity? , 2006, Molecular biology and evolution.

[37]  Naruya Saitou,et al.  Estimation of bacterial species phylogeny through oligonucleotide frequency distances. , 2009, Genomics.