Feature Selection for Consistent Biclustering via Fractional 0–1 Programming

Biclustering consists in simultaneous partitioning of the set of samples and the set of their attributes (features) into subsets (classes). Samples and features classified together are supposed to have a high relevance to each other which can be observed by intensity of their expressions. We define the notion of consistency for biclustering using interrelation between centroids of sample and feature classes. We prove that consistent biclustering implies separability of the classes by convex cones. While previous works on biclustering concentrated on unsupervised learning and did not consider employing a training set, whose classification is given, we propose a model for supervised biclustering, whose consistency is achieved by feature selection. The developed model involves solution of a fractional 0–1 programming problem. Preliminary computational results on microarray data mining problems are reported.

[1]  A. L. Saipe Solving a (0, 1) hyperbolic program by branch and bound , 1975 .

[2]  Richard M. Karp,et al.  CLIFF: clustering of high-dimensional microarray data via iterative feature filtering using normalized cuts , 2001, ISMB.

[3]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[4]  Tai-Hsi Wu A note on a global approach for general 0-1 fractional programming , 1997 .

[5]  Pierre Hansen,et al.  Hyperbolic 0–1 programming and query optimization in information retrieval , 1991, Math. Program..

[6]  Sayan Mukherjee,et al.  Feature Selection for SVMs , 2000, NIPS.

[7]  Nir Friedman,et al.  Class discovery in gene expression data , 2001, RECOMB.

[8]  Maurice Queyranne,et al.  A network flow solution to some nonlinear 0-1 programming problems, with applications to graph theory , 1982, Networks.

[9]  Panos M. Pardalos,et al.  On complexity of unconstrained hyperbolic 0-1 programming problems , 2005, Oper. Res. Lett..

[10]  Nir Friedman,et al.  Tissue classification with gene expression profiles. , 2000 .

[11]  Ramakrishnan Srikant,et al.  Kdd-2001: Proceedings of the Seventh Acm Sigkdd International Conference on Knowledge Discovery and Data Mining : August 26-29, 2001 San Francisco, Ca, USA , 2002 .

[12]  G. Stephanopoulos,et al.  A compendium of gene expression in normal human tissues. , 2001, Physiological genomics.

[13]  Masao Fukushima,et al.  Approximation algorithms for combinatorial fractional programming problems , 1987, Math. Program..

[14]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[15]  Joseph T. Chang,et al.  Spectral biclustering of microarray data: coclustering genes and conditions. , 2003, Genome research.

[16]  Inderjit S. Dhillon,et al.  Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.

[17]  Nikolaos V. Sahinidis,et al.  Global Optimization of 0-1 Hyperbolic Programs , 2002, J. Glob. Optim..