Inapproximability of Maximum Weighted Edge Biclique and Its Applications

Given a bipartite graph G = (V1, V2, E) where edges take on both positive and negative weights from set S, the maximum weighted edge biclique problem, or S-MWEB for short, asks to find a bipartite subgraph whose sum of edge weights is maximized. This problem has various applications in bioinformatics, machine learning and databases and its (in)approximability remains open. In this paper, we show that for a wide range of choices of S, specifically when |min S/max S| ∈ Ω(ηδ-1/2) ∩ O(η1/2-δ) (where η = max{|V1|, |V2|}, and δ ∈ (0, 1/2), no polynomial time algorithm can approximate S-MWEB within a factor of nƐ for some Ɛ > 0 unless RP = NP. This hardness result gives justification of the heuristic approaches adopted for various applied problems in the aforementioned areas, and indicates that good approximation algorithms are unlikely to exist. Specifically, we give two applications by showing that: 1) finding statistically significant biclusters in the SAMBA model, proposed in [18] for the analysis of microarray data, is nƐ-inapproximable; and 2) no polynomial time algorithm exists for the Minimum Description Length with Holes problem [4] unless RP = NP.

[1]  David Zuckerman,et al.  Electronic Colloquium on Computational Complexity, Report No. 100 (2005) Linear Degree Extractors and the Inapproximability of MAX CLIQUE and CHROMATIC NUMBER , 2005 .

[2]  Subhash Khot,et al.  Ruling out PTAS for graph min-bisection, densest subgraph and bipartite clique , 2004, 45th Annual IEEE Symposium on Foundations of Computer Science.

[3]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[4]  Jayashankar M. Swaminathan,et al.  Managing broader product lines through delayed differentiation using vanilla boxes , 1998 .

[5]  Laks V. S. Lakshmanan,et al.  MDL Summarization with Holes , 2005, VLDB.

[6]  Avrim Blum,et al.  Correlation Clustering , 2004, Machine Learning.

[7]  René Peeters,et al.  The maximum edge biclique problem is NP-complete , 2003, Discret. Appl. Math..

[8]  Milind Dawande,et al.  On Bipartite and Multipartite Clique Problems , 2001, J. Algorithms.

[9]  Song Zhu,et al.  Algorithmic and Complexity Issues of Three Clustering Methods in Microarray Data Analysis , 2005, Algorithmica.

[10]  J. Håstad Clique is hard to approximate withinn1−ε , 1999 .

[11]  Roded Sharan,et al.  Discovering statistically significant biclusters in gene expression data , 2002, ISMB.

[12]  Song Zhu,et al.  A new clustering method for microarray data analysis , 2002, Proceedings. IEEE Computer Society Bioinformatics Conference.

[13]  Uriel Feige,et al.  Relations between average case complexity and approximation complexity , 2002, STOC '02.

[14]  Anthony Wirth,et al.  Correlation Clustering , 2010, Encyclopedia of Machine Learning and Data Mining.

[15]  Shimon Kogan,et al.  Hardness of approximation of the Balanced Complete Bipartite Subgraph problem , 2004 .

[16]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[17]  Richard M. Karp,et al.  Discovering local structure in gene expression data: the order-preserving submatrix problem , 2002, RECOMB '02.

[18]  Shaofeng Bu The Summarization of Hierarchical Data with Exceptions , 2004 .

[19]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[20]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[21]  Dana Ron,et al.  On Finding Large Conjunctive Clusters , 2003, COLT.

[22]  J. Håstad Clique is hard to approximate within n 1-C , 1996 .

[23]  Subhash Khot Ruling Out PTAS for Graph Min-Bisection, Densest Subgraph and Bipartite Clique , 2004, FOCS.