Clustering pair-wise dissimilarity data into partially ordered sets

Ontologies represent data relationships as hierarchies of possibly overlapping classes. Ontologies are closely related to clustering hierarchies, and in this article we explore this relationship in depth. In particular, we examine the space of ontologies that can be generated by pairwise dissimilarity matrices. We demonstrate that classical clustering algorithms, which take dissimilarity matrices as inputs, do not incorporate all available information. In fact, only special types of dissimilarity matrices can be exactly preserved by previous clustering methods. We model ontologies as a partially ordered set (poset) over the subset relation. In this paper, we propose a new clustering algorithm, that generates a partially ordered set of clusters from a dissimilarity matrix.

[1]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[2]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[3]  David G. Stork,et al.  Pattern Classification , 1973 .

[4]  Pavel Berkhin,et al.  A Survey of Clustering Data Mining Techniques , 2006, Grouping Multidimensional Data.

[5]  C. Bron,et al.  Algorithm 457: finding all cliques of an undirected graph , 1973 .

[6]  Angel Rubio,et al.  Correlation between Gene Expression and GO Semantic Similarity , 2005, TCBB.

[7]  Graeme Hirst,et al.  Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures , 2004 .

[8]  Loo Keng Hua,et al.  Introduction to number theory , 1982 .

[9]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[10]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[11]  Enrico Motta,et al.  ClaiMaker: Weaving a Semantic Web of Research Papers , 2002, SEMWEB.

[12]  Paul J. Schweitzer,et al.  Problem Decomposition and Data Reorganization by a Clustering Technique , 1972, Oper. Res..

[13]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[14]  Melvin F. Janowitz,et al.  Pyramids and weak hierarchies in the ordinal model for clustering , 2002, Discret. Appl. Math..

[15]  Edwin Diday,et al.  Orders and overlapping clusters by pyramids , 1987 .

[16]  Jean-Christophe Aude,et al.  Applications of the Pyramidal Clustering Method to Biological Objects , 1999, Comput. Chem..

[17]  Cliff Joslyn,et al.  The Gene Ontology Categorizer , 2004, ISMB/ECCB.

[18]  Olivier Bodenreider,et al.  An ontology-driven clustering method for supporting gene expression analysis , 2005, 18th IEEE Symposium on Computer-Based Medical Systems (CBMS'05).

[19]  G. Church,et al.  Systematic determination of genetic network architecture , 1999, Nature Genetics.

[20]  Carole A. Goble,et al.  Investigating Semantic Similarity Measures Across the Gene Ontology: The Relationship Between Sequence and Annotation , 2003, Bioinform..