Sequence Bundles: a novel method for visualising, discovering and exploring sequence motifs

BackgroundWe introduce Sequence Bundles--a novel data visualisation method for representing multiple sequence alignments (MSAs). We identify and address key limitations of the existing bioinformatics data visualisation methods (i.e. the Sequence Logo) by enabling Sequence Bundles to give salient visual expression to sequence motifs and other data features, which would otherwise remain hidden.MethodsFor the development of Sequence Bundles we employed research-led information design methodologies. Sequences are encoded as uninterrupted, semi-opaque lines plotted on a 2-dimensional reconfigurable grid. Each line represents a single sequence. The thickness and opacity of the stack at each residue in each position indicates the level of conservation and the lines' curved paths expose patterns in correlation and functionality. Several MSAs can be visualised in a composite image. The Sequence Bundles method is designed to favour a tangible, continuous and intuitive display of information.ResultsWe have developed a software demonstration application for generating a Sequence Bundles visualisation of MSAs provided for the BioVis 2013 redesign contest. A subsequent exploration of the visualised line patterns allowed for the discovery of a number of interesting features in the dataset. Reported features include the extreme conservation of sequences displaying a specific residue and bifurcations of the consensus sequence.ConclusionsSequence Bundles is a novel method for visualisation of MSAs and the discovery of sequence motifs. It can aid in generating new insight and hypothesis making. Sequence Bundles is well disposed for future implementation as an interactive visual analytics software, which can complement existing visualisation tools.

[1]  Richard A. Becker,et al.  Brushing scatterplots , 1987 .

[2]  T. D. Schneider,et al.  Sequence logos: a new way to display consensus sequences. , 1990, Nucleic acids research.

[3]  Jörg Schultz,et al.  HMM Logos for visualization of protein families , 2004, BMC Bioinformatics.

[4]  Lennart Strand idX Information Design Exchange. What information designers know and can do , 2007 .

[5]  Philipp N. Seibel,et al.  Detecting species-site dependencies in large multiple sequence alignments , 2009, Nucleic acids research.

[6]  Cédric Notredame,et al.  Upcoming challenges for multiple sequence alignment methods in the high-throughput era , 2009, Bioinform..

[7]  J. Thompson,et al.  Issues in bioinformatics benchmarking: the case study of multiple sequence alignment , 2010, Nucleic acids research.

[8]  Matthew O. Ward,et al.  Interactive data visualization , 2010 .

[9]  R. Lundblad,et al.  Properties of Amino Acids , 2010 .

[10]  Gregory M. Provan,et al.  CodonLogo: a sequence logo-based viewer for codon patterns , 2012, Bioinform..

[11]  Morten Nielsen,et al.  Seq2Logo: a method for construction and visualization of amino acid binding motifs and sequence profiles including sequence weighting, pseudo counts and two-sided representation of amino acid enrichment and depletion , 2012, Nucleic Acids Res..

[12]  George M Church,et al.  pLogo: a probabilistic approach to visualizing sequence motifs , 2013, Nature Methods.

[13]  Visual Alpha-Beta-Gamma : Rudiments of Visual Design for Data Explorers , 2013 .