Exploratory visual analysis of conserved domains on multiple sequence alignments

BackgroundMultiple alignment of protein sequences can provide insight into sequence conservation across many species and thus allow identification of those sections of the sequence most critical to protein function. This insight can be augmented by joint display of conserved domains along the sequences. By fusing this metadata visually, biologists can analyze sequence conservation and functional motifs simultaneously and efficiently.ResultsWe present MSAVis, a new approach combining luminance and hue for simultaneous visualization of conserved motifs and sequence alignment. Input for the algorithm is a multiple sequence alignment in a standard format. The NCBI Conserved Domain Database (CDD) is used for finding conserved domains along the alignment. The visualization quickly identifies conserved domains, and allows both macro (sequence-long) and micro (small amino-acid neighborhood) views.ConclusionMSAVis utilizes two visual cues, luminance and hue, to facilitate at-a-glance summary of the conservation of a user-provided protein alignment while enabling multiple comparisons among functional domains. These visual cues are preattentive and separable so that the relationship between conservation strength and domain membership can be understood. The MSAVis software, written in Python and using BioPython and OpenGL, can be found at http://agbase.msstate.edu/tools/MSAVis.html.

[1]  Robert D. Finn,et al.  InterPro: the integrative protein signature database , 2008, Nucleic Acids Res..

[2]  Tamara Munzner,et al.  SequenceJuxtaposer: Fluid Navigation For Large-Scale Sequence Comparison in Context , 2004, German Conference on Bioinformatics.

[3]  Mircea Lungu,et al.  Biomedical Information Visualization , 2006, Human-Centered Visualization Environments.

[4]  Mikael Bodén,et al.  MEME Suite: tools for motif discovery and searching , 2009, Nucleic Acids Res..

[5]  Rachael Brady,et al.  BARD: a visualization tool for biological sequence analysis , 2003, IEEE Symposium on Information Visualization 2003 (IEEE Cat. No.03TH8714).

[6]  Colin Ware,et al.  Information Visualization: Perception for Design , 2000 .

[7]  Lior Pachter,et al.  VISTA : visualizing global DNA sequence alignments of arbitrary length , 2000, Bioinform..

[8]  T. D. Schneider,et al.  Sequence logos: a new way to display consensus sequences. , 1990, Nucleic acids research.

[9]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Peter Pirolli,et al.  Information Foraging , 2009, Encyclopedia of Database Systems.

[11]  John B. Anderson,et al.  CDD: a Conserved Domain Database for protein classification , 2004, Nucleic Acids Res..

[12]  Rodrigo Lopez,et al.  Clustal W and Clustal X version 2.0 , 2007, Bioinform..

[13]  W. Urban Professor Wundt's 'Ueber naiven und kritischen realismus'. , 2022 .

[14]  Nan Wang,et al.  AgBase: a functional genomics resource for agriculture , 2006, BMC Genomics.

[15]  K. Nicholas,et al.  GeneDoc: Analysis and visualization of genetic variation , 1997 .

[16]  Bartek Wilczynski,et al.  Biopython: freely available Python tools for computational molecular biology and bioinformatics , 2009, Bioinform..

[17]  K. Mclaren XIII—The Development of the CIE 1976 (L* a* b*) Uniform Colour Space and Colour‐difference Formula , 2008 .

[18]  Bernd Hamann,et al.  Phylo-VISTA: interactive visualization of multiple DNA sequence alignments , 2004, Bioinform..

[19]  Geoffrey J. Barton,et al.  The Jalview Java alignment editor , 2004, Bioinform..