Visualizing biosequence data using texture mapping

Data-mining of information by the process of pattern discovery in protein sequences has been predominantly algorithm based. We discuss a visualization approach, which uses texture mapping and blending techniques to perform visual data-mining on text data obtained from discovering patterns in protein sequences. This visual approach, investigates the possibilities of representing text data in three dimensions and provides new possibilities of representing more dimensions of information in text data visualization and analysis. We also present a generic framework derived from this visualization approach to visualize text in biosequence data.

[1]  Timothy B. Stockwell,et al.  The Sequence of the Human Genome , 2001, Science.

[2]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence data bank and its supplement TrEMBL , 1997, Nucleic Acids Res..

[3]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[4]  Aris Floratos,et al.  Sequence homology detection through large scale pattern discovery , 1999, RECOMB.

[5]  Andrea Califano,et al.  SPLASH: structural pattern localization analysis by sequential histograms , 2000, Bioinform..

[6]  Alan Keahey,et al.  Getting along: composition of visualization paradigms , 2001, IEEE Symposium on Information Visualization, 2001. INFOVIS 2001..

[7]  Charles Elkan,et al.  Fitting a Mixture Model By Expectation Maximization To Discover Motifs In Biopolymer , 1994, ISMB.

[8]  D. Mccormick Sequence the Human Genome , 1986, Bio/Technology.

[9]  Amos Bairoch,et al.  The PROSITE database, its status in 1997 , 1997, Nucleic Acids Res..

[10]  I. Rigoutsos,et al.  The emergence of pattern discovery techniques in computational biology. , 2000, Metabolic engineering.

[11]  Michael Gribskov,et al.  Combining evidence using p-values: application to sequence homology searches , 1998, Bioinform..

[12]  David Maier,et al.  The Complexity of Some Problems on Subsequences and Supersequences , 1978, JACM.

[13]  Amos Bairoch,et al.  The PROSITE database, its status in 1999 , 1999, Nucleic Acids Res..

[14]  Dan Gusfield Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[15]  Yuan Gao,et al.  Pattern discovery on character sets and real-valued data: linear bound on irredundant motifs and an efficient polynomial time algorithm , 2000, SODA '00.

[16]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[17]  Parris K. Egbert,et al.  Interactive display of very large textures , 1998, Proceedings Visualization '98 (Cat. No.98CB36276).

[18]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 , 2000, Nucleic Acids Res..

[19]  Aris Floratos,et al.  Combinatorial pattern discovery in biological sequences: The TEIRESIAS algorithm [published erratum appears in Bioinformatics 1998;14(2): 229] , 1998, Bioinform..