DNA Data Visualization (DDV): Software for Generating Web-Based Interfaces Supporting Navigation and Analysis of DNA Sequence Data of Entire Genomes

Data visualization methods are necessary during the exploration and analysis activities of an increasingly data-intensive scientific process. There are few existing visualization methods for raw nucleotide sequences of a whole genome or chromosome. Software for data visualization should allow the researchers to create accessible data visualization interfaces that can be exported and shared with others on the web. Herein, novel software developed for generating DNA data visualization interfaces is described. The software converts DNA data sets into images that are further processed as multi-scale images to be accessed through a web-based interface that supports zooming, panning and sequence fragment selection. Nucleotide composition frequencies and GC skew of a selected sequence segment can be obtained through the interface. The software was used to generate DNA data visualization of human and bacterial chromosomes. Examples of visually detectable features such as short and long direct repeats, long terminal repeats, mobile genetic elements, heterochromatic segments in microbial and human chromosomes, are presented. The software and its source code are available for download and further development. The visualization interfaces generated with the software allow for the immediate identification and observation of several types of sequence patterns in genomes of various sizes and origins. The visualization interfaces generated with the software are readily accessible through a web browser. This software is a useful research and teaching tool for genetics and structural genomics.

[1]  B. J. Hinnebusch,et al.  Physical mapping of an origin of bidirectional replication at the centre of the Borrelia burgdorferi linear chromosome , 1999, Molecular microbiology.

[2]  A. Roberts,et al.  Genetic Organisation, Mobility and Predicted Functions of Genes on Integrated, Mobile Genetic Elements in Sequenced Strains of Clostridium difficile , 2011, PloS one.

[3]  K. Oosawa,et al.  Color-coding reveals tandem repeats in the Escherichia coli genome. , 2000, Journal of molecular biology.

[4]  Michael C Zody,et al.  Closing gaps in the human genome using sequencing by synthesis , 2009, Genome Biology.

[5]  J. Rood,et al.  Genomic analysis of the erythromycin resistance element Tn5398 from Clostridium difficile. , 2001, Microbiology.

[6]  G. Benson,et al.  Tandem repeats finder: a program to analyze DNA sequences. , 1999, Nucleic acids research.

[7]  Kirk Martinez,et al.  VIPS - a highly tuned image processing software architecture , 2005, IEEE International Conference on Image Processing 2005.

[8]  Julian Parkhill,et al.  The multidrug-resistant human pathogen Clostridium difficile has a highly mobile, mosaic genome , 2006, Nature Genetics.

[9]  Laura S. Frost,et al.  Mobile genetic elements: the agents of open source evolution , 2005, Nature Reviews Microbiology.

[10]  Ivan Junier,et al.  The layout of a bacterial genome , 2012, FEBS letters.

[11]  J. Zakrzewska‐Czerwińska,et al.  Organization and nucleotide sequence analysis of the ribosomal gene set (rrnB) from Streptomyces lividans. , 1997, Gene.

[12]  Hans Hagen,et al.  Collaborative visualization: Definition, challenges, and research agenda , 2011, Inf. Vis..

[13]  A. Danchin,et al.  Universal replication biases in bacteria , 1999, Molecular microbiology.

[14]  F. Boccard,et al.  Organization and nucleotide sequence analysis of a ribosomal RNA gene cluster from Streptomyces ambofaciens. , 1989, Gene.

[15]  B. Barrell,et al.  Complete genome sequence of the model actinomycete Streptomyces coelicolor A3(2) , 2002, Nature.

[16]  Evan E. Eichler,et al.  An assessment of the sequence gaps: Unfinished business in a finished human genome , 2004, Nature Reviews Genetics.

[17]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[18]  S. French,et al.  Consequences of replication fork movement through transcription units in vivo. , 1992, Science.

[19]  F Wright,et al.  Codon usage in the G+C-rich Streptomyces genome. , 1992, Gene.

[20]  J. Butler,et al.  Genetics and Genomics of Core Short Tandem Repeat Loci Used in Human Identity Testing , 2006, Journal of forensic sciences.

[21]  Guillaume Pavlovic,et al.  The ICESt1 element of Streptococcus thermophilus belongs to a large family of integrative and conjugative elements that exchange modules and change their specificity of integration. , 2002, Plasmid.

[22]  J. Lobry Asymmetric substitution patterns in the two DNA strands of bacteria. , 1996, Molecular biology and evolution.

[23]  J. Blom,et al.  Genome Sequence of the Bacterium Streptomyces davawensis JCM 4913 and Heterologous Production of the Unique Antibiotic Roseoflavin , 2012, Journal of bacteriology.

[24]  John C. Sanford,et al.  Skittle: A 2-Dimensional Genome Visualization Tool , 2009, BMC Bioinformatics.

[25]  D. Hopwood,et al.  Soil to genomics: the Streptomyces chromosome. , 2006, Annual review of genetics.

[26]  Masashi Suzuki,et al.  Visual presentation of complete genomic DNA sequences, and its application to identification of gene-coding regions , 1999 .

[27]  Stanley N Cohen,et al.  Genome plasticity in Streptomyces: identification of 1 Mb TIRs in the S. coelicolor A3(2) chromosome , 2004, Molecular microbiology.

[28]  Henk-Evert Sonder,et al.  The Microsoft .NET Framework , 2001 .

[29]  I. Dubchak,et al.  Visualizing genomes: techniques and challenges , 2010, Nature Methods.