Computing and visually analyzing mutual information in molecular co-evolution

BackgroundSelective pressure in molecular evolution leads to uneven distributions of amino acids and nucleotides. In fact one observes correlations among such constituents due to a large number of biophysical mechanisms (folding properties, electrostatics, ...). To quantify these correlations the mutual information -after proper normalization - has proven most effective. The challenge is to navigate the large amount of data, which in a study for a typical protein cannot simply be plotted.ResultsTo visually analyze mutual information we developed a matrix visualization tool that allows different views on the mutual information matrix: filtering, sorting, and weighting are among them. The user can interactively navigate a huge matrix in real-time and search e.g., for patterns and unusual high or low values. A computation of the mutual information matrix for a sequence alignment in FASTA-format is possible. The respective stand-alone program computes in addition proper normalizations for a null model of neutral evolution and maps the mutual information to Z-scores with respect to the null model.ConclusionsThe new tool allows to compute and visually analyze sequence data for possible co-evolutionary signals. The tool has already been successfully employed in evolutionary studies on HIV1 protease and acetylcholinesterase. The functionality of the tool was defined by users using the tool in real-world research. The software can also be used for visual analysis of other matrix-like data, such as information obtained by DNA microarray experiments. The package is platform-independently implemented in Java and free for academic use under a GPL license.

[1]  G. Gloor,et al.  Mutual information in protein multiple sequence alignments reveals two classes of coevolving positions. , 2005, Biochemistry.

[2]  Ben Shneiderman,et al.  Readings in information visualization - using vision to think , 1999 .

[3]  Kay Hamacher,et al.  BioPhysConnectoR: Connecting Sequence Information and Biophysical Models , 2010, BMC Bioinformatics.

[4]  Mario A. Fares,et al.  Why Should We Care About Molecular Coevolution? , 2008, Evolutionary bioinformatics online.

[5]  Haim Levkowitz,et al.  Color scales for image data , 1992, IEEE Computer Graphics and Applications.

[6]  K Schulten,et al.  VMD: visual molecular dynamics. , 1996, Journal of molecular graphics.

[7]  Zoran Radić,et al.  Structural insights into ligand interactions at the acetylcholinesterase peripheral anionic site , 2003, The EMBO journal.

[8]  Daniel A. Keim,et al.  Information Visualization and Visual Data Mining , 2002, IEEE Trans. Vis. Comput. Graph..

[9]  Ben Shneiderman,et al.  The eyes have it: a task by data type taxonomy for information visualizations , 1996, Proceedings 1996 IEEE Symposium on Visual Languages.

[10]  Kristin A. Cook,et al.  Illuminating the Path: The Research and Development Agenda for Visual Analytics , 2005 .

[11]  Yuhong Yang,et al.  Information Theory, Inference, and Learning Algorithms , 2005 .

[12]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[13]  Charles Hansen,et al.  The Visualization Handbook , 2011 .

[14]  Kay Hamacher,et al.  Estimating sufficient statistics in co-evolutionary analysis by mutual information , 2009, Comput. Biol. Chem..

[15]  Cristina Marino Buslje,et al.  Correction for phylogeny, small number of observations and data redundancy improves the identification of coevolving amino acid pairs using mutual information , 2009, Bioinform..

[16]  K. Hamacher,et al.  Relating sequence evolution of HIV1-protease to its underlying molecular mechanics. , 2008, Gene.

[17]  C. Pipper,et al.  [''R"--project for statistical computing]. , 2008, Ugeskrift for laeger.

[18]  Anders Gorm Pedersen,et al.  Finding coevolving amino acid residues using row and column weighting of mutual information and multi-dimensional amino acid representation , 2007, Algorithms for molecular biology : AMB.