Comprehensive cluster analysis with Transitivity Clustering

Transitivity Clustering is a method for the partitioning of biological data into groups of similar objects, such as genes, for instance. It provides integrated access to various functions addressing each step of a typical cluster analysis. To facilitate this, Transitivity Clustering is accessible online and offers three user-friendly interfaces: a powerful stand-alone version, a web interface, and a collection of Cytoscape plug-ins. In this paper, we describe three major workflows: (i) protein (super)family detection with Cytoscape, (ii) protein homology detection with incomplete gold standards and (iii) clustering of gene expression data. This protocol guides the user through the most important features of Transitivity Clustering and takes ∼1 h to complete.

[1]  Gapped BLAST and PSI-BLAST: A new , 1997 .

[2]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[3]  Anton J. Enright,et al.  GeneRAGE: a robust algorithm for sequence clustering and domain detection , 2000, Bioinform..

[4]  Anton J. Enright,et al.  An efficient algorithm for large-scale detection of protein families. , 2002, Nucleic acids research.

[5]  Anton J. Enright,et al.  Protein families and TRIBES in genome sequence space. , 2003, Nucleic acids research.

[6]  James A. Casbon,et al.  Spectral clustering of protein sequences , 2006, Nucleic acids research.

[7]  Martin Vingron,et al.  Large scale hierarchical clustering of protein sequences , 2005, BMC Bioinformatics.

[8]  Jill P. Mesirov,et al.  Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data , 2003, Machine Learning.

[9]  Shoshana D. Brown,et al.  A gold standard set of mechanistically diverse enzyme superfamilies , 2006, Genome Biology.

[10]  Michael L. Creech,et al.  Integration of biological networks and gene expression data using Cytoscape , 2007, Nature Protocols.

[11]  Sven Rahmann,et al.  Large scale clustering of protein sequences with FORCE -A layout based heuristic for weighted cluster editing , 2007, BMC Bioinformatics.

[12]  Sven Rahmann,et al.  Exact and heuristic algorithms for weighted cluster editing. , 2007, Computational systems bioinformatics. Computational Systems Bioinformatics Conference.

[13]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[14]  Sebastian Böcker,et al.  Exact Algorithms for Cluster Editing: Evaluation and Experiments , 2008, WEA.

[15]  Tobias Wittkop,et al.  Clustering biological data by unraveling hidden transitive substructures , 2010 .

[16]  Dorothea Emig,et al.  Partitioning biological data with transitivity clustering , 2010, Nature Methods.