Finding checkerboard patterns via fractional 0–1 programming

Biclustering is a data mining technique used to simultaneously partition the set of samples and the set of their attributes (features) into subsets (clusters). Samples and features clustered together are supposed to have a high relevance to each other. In this paper we provide a new mathematical programming formulation for unsupervised biclustering. The proposed model involves the solution of a fractional 0–1 programming problem. A linear-mixed 0–1 reformulation as well as two heuristic-based approaches are developed. Encouraging computational results on clustering real DNA microarray data sets are presented. In addition, we also discuss theoretical computational complexity issues related to biclustering.

[1]  G. Stephanopoulos,et al.  A compendium of gene expression in normal human tissues. , 2001, Physiological genomics.

[2]  A. L. Saipe Solving a (0, 1) hyperbolic program by branch and bound , 1975 .

[3]  Panos M. Pardalos,et al.  On Multiple-Ratio Hyperbolic 0-1 Programming Problems , 2005 .

[4]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[5]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[6]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[7]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[8]  Philip S. Yu,et al.  /spl delta/-clusters: capturing subspace correlation in a large data set , 2002, Proceedings 18th International Conference on Data Engineering.

[9]  Inderjit S. Dhillon,et al.  Minimum Sum-Squared Residue Co-Clustering of Gene Expression Data , 2004, SDM.

[10]  Richard M. Karp,et al.  CLIFF: clustering of high-dimensional microarray data via iterative feature filtering using normalized cuts , 2001, ISMB.

[11]  Ramakrishnan Srikant,et al.  Kdd-2001: Proceedings of the Seventh Acm Sigkdd International Conference on Knowledge Discovery and Data Mining : August 26-29, 2001 San Francisco, Ca, USA , 2002 .

[12]  Inderjit S. Dhillon,et al.  Information-theoretic co-clustering , 2003, KDD '03.

[13]  Tai-Hsi Wu A note on a global approach for general 0-1 fractional programming , 1997 .

[14]  Inderjit S. Dhillon,et al.  Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.

[15]  Joseph T. Chang,et al.  Spectral biclustering of microarray data: coclustering genes and conditions. , 2003, Genome research.

[16]  R. Kipp Martin,et al.  Large scale linear and integer optimization - a unified approach , 1998 .

[17]  Sayan Mukherjee,et al.  Feature Selection for SVMs , 2000, NIPS.

[18]  Panos M. Pardalos,et al.  Feature Selection for Consistent Biclustering via Fractional 0–1 Programming , 2005, J. Comb. Optim..

[19]  Pierre Hansen,et al.  Hyperbolic 0–1 programming and query optimization in information retrieval , 1991, Math. Program..

[20]  Nir Friedman,et al.  Tissue classification with gene expression profiles , 2000, RECOMB '00.

[21]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[22]  J. Hartigan Direct Clustering of a Data Matrix , 1972 .

[23]  Inderjit S. Dhillon,et al.  Information theoretic clustering of sparse cooccurrence data , 2003, Third IEEE International Conference on Data Mining.

[24]  Nikolaos V. Sahinidis,et al.  Global Optimization of 0-1 Hyperbolic Programs , 2002, J. Glob. Optim..

[25]  Nir Friedman,et al.  Class discovery in gene expression data , 2001, RECOMB.

[26]  Richard M. Karp,et al.  Discovering local structure in gene expression data: the order-preserving submatrix problem , 2002, RECOMB '02.

[27]  Panos M. Pardalos,et al.  On complexity of unconstrained hyperbolic 0-1 programming problems , 2005, Oper. Res. Lett..

[28]  Panos M. Pardalos,et al.  Biclustering in data mining , 2008, Comput. Oper. Res..