Potential Perils of Biological Sequence Visualization Using Sequence Logo

Sequence motif's characteristics are commonly visualized by using a sequence logo. This paper describes a user study aimed at evaluating the effectiveness of sequence logo as evaluation metric for motif prediction tools. We also investigate the nature of confirmation biases in using sequence logos in result reporting in publications. While sequence logos have been widely used for visualizing sequence motifs in the past 20 years, no study has reported its effectiveness and possible misuses in decision making. We conducted a paper-and-pencil test to determine the effectiveness of sequence logos in some of their common usages. A survey study was also performed to investigate sequence logos' learn ability. We found that there are great mismatches between users' perception and actual quality of motifs when sequence logos were used as an evaluation metric. Therefore, evaluation of motif prediction tools based on sequence logos has to be interpreted cautiously. Our result also suggests that there are still room for improvements in the current sequence logo's layout design.

[1]  Gary D. Stormo,et al.  enoLOGOS: a versatile web tool for energy normalized sequence logos , 2005, Nucleic Acids Res..

[2]  Yin Bee Oon,et al.  Decision making biases in using sequence logo visualization , 2012, 2012 Southeast Asian Network of Ergonomics Societies Conference (SEANES).

[3]  Jorng-Tzong Horng,et al.  RNALogo: a new approach to display structural RNA alignment , 2008, Nucleic Acids Res..

[4]  Wyeth W. Wasserman,et al.  JASPAR: an open-access database for eukaryotic transcription factor binding profiles , 2004, Nucleic Acids Res..

[5]  Jun S. Liu,et al.  An algorithm for finding protein–DNA binding sites with applications to chromatin-immunoprecipitation microarray experiments , 2002, Nature Biotechnology.

[6]  Sridhar Hannenhalli,et al.  Selection of Target Sites for Mobile DNA Integration in the Human Genome , 2006, PLoS Comput. Biol..

[7]  Vladimir Vacic,et al.  Two Sample Logo: a graphical representation of the differences between two sets of sequence alignments , 2006, Bioinform..

[8]  Martin J. Eppler,et al.  The Risks of Visualization: A Classification of Disadvantages Associated with Graphic Representations of Information , 2009 .

[9]  E. Birney,et al.  Trawler: de novo regulatory motif discovery pipeline for chromatin immunoprecipitation , 2007, Nature Methods.

[10]  Ergonomic requirements for office work with visual display terminals ( VDTs ) — Part 11 : Guidance on usability , 1998 .

[11]  T. D. Schneider,et al.  Sequence logos: a new way to display consensus sequences. , 1990, Nucleic acids research.

[12]  Gary D. Stormo,et al.  DNA binding sites: representation and discovery , 2000, Bioinform..

[13]  G. Humphrey The Psychology of the Gestalt. , 1924 .

[14]  Clifford A. Meyer,et al.  Genome-wide analysis of estrogen receptor binding sites , 2006, Nature Genetics.

[15]  Eckart Bindewald,et al.  CorreLogo: an online server for 3D sequence logos of RNA and DNA alignments , 2006, Nucleic Acids Res..

[16]  T. D. Schneider,et al.  Consensus sequence Zen. , 2002, Applied bioinformatics.

[17]  A. Sandelin,et al.  Applied bioinformatics for the identification of regulatory elements , 2004, Nature Reviews Genetics.