Towards Constrained Co-clustering in Ordered 0/1 Data Sets

Within 0/1 data, co-clustering provides a collection of bi-clusters, i.e., linked clusters for both objects and Boolean properties. Beside the classical need for grouping quality optimization, one can also use user-defined constraints to capture subjective interestingness aspects and thus to improve bi-cluster relevancy. We consider the case of 0/1 data where at least one dimension is ordered, e.g., objects denotes time points, and we introduce co-clustering constrained by interval constraints. Exploiting such constraints during the intrinsically heuristic clustering process is challenging. We propose one major step in this direction where bi-clusters are computed from collections of local patterns. We provide an experimental validation on two temporal gene expression data sets.

[1]  Dan Klein,et al.  From Instance-level Constraints to Space-Level Constraints: Making the Most of Prior Knowledge in Data Clustering , 2002, ICML.

[2]  S. S. Ravi,et al.  Clustering with Constraints: Feasibility Issues and the k-Means Algorithm , 2005, SDM.

[3]  M. Eisen,et al.  Why PLoS Became a Publisher , 2003, PLoS biology.

[4]  Luís Torgo,et al.  Knowledge Discovery in Databases: PKDD 2005, 9th European Conference on Principles and Practice of Knowledge Discovery in Databases, Porto, Portugal, October 3-7, 2005, Proceedings , 2005, PKDD.

[5]  Claire Cardie,et al.  Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .

[6]  Leo A. Goodman,et al.  Corrigenda: Measures of Association for Cross Classifications , 1957 .

[7]  S. S. Ravi,et al.  Agglomerative Hierarchical Clustering with Constraints: Theoretical and Empirical Results , 2005, PKDD.

[8]  L. A. Goodman,et al.  Measures of Association for Cross Classifications III: Approximate Sampling Theory , 1963 .

[9]  B. S. Baker,et al.  Gene Expression During the Life Cycle of Drosophila melanogaster , 2002, Science.

[10]  L. A. Goodman,et al.  Measures of association for cross classifications , 1979 .

[11]  Shinichi Morishita,et al.  Constrained clusters of gene expression profiles with pathological features , 2004, Bioinform..

[12]  Jean-François Boulicaut,et al.  Constraint-based concept mining and its application to microarray data analysis , 2005, Intell. Data Anal..

[13]  Céline Robardet,et al.  Efficient Local Search in Conceptual Clustering , 2001, Discovery Science.

[14]  Djamel A. Zighed,et al.  Simultaneous Row and Column Partitioning: Evaluation of a Heuristic , 2003, ISMIS.

[15]  J. Derisi,et al.  The Transcriptome of the Intraerythrocytic Developmental Cycle of Plasmodium falciparum , 2003, PLoS biology.

[16]  Ruggero G. Pensa,et al.  A Bi-clustering Framework for Categorical Data , 2005, PKDD.

[17]  C. Becquet,et al.  Strong-association-rule mining for large-scale gene-expression data analysis: a case study on human SAGE data , 2002, Genome Biology.

[18]  Shusaku Tsumoto,et al.  Foundations of Intelligent Systems, 15th International Symposium, ISMIS 2005, Saratoga Springs, NY, USA, May 25-28, 2005, Proceedings , 2005, ISMIS.

[19]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[20]  Inderjit S. Dhillon,et al.  Information-theoretic co-clustering , 2003, KDD '03.