Comparing, Contrasting and Combining Clusters in Viral Gene Expression

Massively high-dimensional datasets are fast becoming commonplace and any advances in the reliable partitioning of this kind of data into smaller, relatively independent clusters would be highly beneficial. Nowhere is this more true than functional genomics where gene expression data can contain thousands of variables. Methods to divide the data into manageable portions are urgently needed. This would pave the way for progress in obtaining models of biological processes for explanation and prediction, and ultimately, lead to a great improvement in the quality of human life. In this paper we contrast and compare several different clustering techniques from both the statistical and Artificial Intelligence communities in the context of viral gene expression data. We introduce a method, Clusterfusion, which takes the results of different clustering algorithms and generates a set of robust clusters based upon the consensus of the results of each algorithm.