Flexible information visualization of multivariate data from biological sequence similarity searches

Information visualization faces challenges presented by the need to represent abstract data and the relationships within the data. Previously, we presented a system for visualizing similarities between a single DNA sequence and a large database of other DNA sequences (E.H. Chi et al., 1995). Similarity algorithms generate similarity information in textual reports that can be hundreds or thousands of pages long. Our original system visualized the most important variables from these reports. However, the biologists we work with found this system so useful they requested visual representations of other variables. We present an enhanced system for interactive exploration of this multivariate data. We identify a larger set of useful variables in the information space. The new system involves more variables, so it focuses on exploring subsets of the data. We present an interactive system allowing mapping of different variables to different axes, incorporating animation using a time axis, and providing tools for viewing subsets of the data. Detail-on-demand is preserved by hyperlinks to the analysis reports. We present three case studies illustrating the use of these techniques. The combined technique of applying a time axis with a 3D scatter plot and query filters to visualization of biological sequence similarity data is both powerful and novel.

[1]  H. J. Jeffrey Chaos game representation of gene structure. , 1990, Nucleic acids research.

[2]  Alfred Inselberg,et al.  Parallel coordinates: a tool for visualizing multi-dimensional geometry , 1990, Proceedings of the First IEEE Conference on Visualization: Visualization `90.

[3]  Ben Shneiderman,et al.  Visual information seeking: tight coupling of dynamic query filters with starfield displays , 1994, CHI '94.

[4]  Thom Grace,et al.  Computer visualization of long genomic sequences , 1993, Proceedings Visualization '93.

[5]  David J. States,et al.  Identification of protein coding regions by database similarity search , 1993, Nature Genetics.

[6]  Daniel Asimov,et al.  The grand tour: a tool for viewing multidimensional data , 1985 .

[7]  David L. Donoho,et al.  MacSpin: dynamic graphics on a desktop computer , 1988, IEEE Computer Graphics and Applications.

[8]  Alfred Inselberg,et al.  Multidimensional Lines. I: Representation , 1994, SIAM J. Appl. Math..

[9]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[10]  Ben Shneiderman,et al.  Visual information seeking: tight coupling of dynamic query filters with starfield displays , 1994, CHI Conference Companion.

[11]  Steven K. Feiner,et al.  Visualizing n-dimensional virtual worlds with n-vision , 1990, I3D '90.

[12]  S. Karlin,et al.  Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[13]  John Riedl,et al.  Visualization of biological sequence similarity search results , 1995, Proceedings Visualization '95.

[14]  Allan R. Wilks,et al.  Visualizing Network Data , 1995, IEEE Trans. Vis. Comput. Graph..

[15]  S Henikoff,et al.  Performance evaluation of amino acid substitution matrices , 1993, Proteins.

[16]  M. O. Dayhoff,et al.  22 A Model of Evolutionary Change in Proteins , 1978 .

[17]  Alfred Inselberg,et al.  Parallel coordinates for visualizing multi-dimensional geometry , 1987 .

[18]  E. Hamori,et al.  H curves, a novel method of representation of nucleotide series especially suited for long DNA sequences. , 1983, The Journal of biological chemistry.

[19]  H. J. Jeffrey Chaos game representation of gene structure. , 1990, Nucleic acids research.

[20]  John W. Tukey,et al.  PRIM-9: An Interactive Multi-dimensional Data Display and Analysis System , 1975, ACM Pacific.