Minimax Localization of Structural Information in Large Noisy Matrices

We consider the problem of identifying a sparse set of relevant columns and rows in a large data matrix with highly corrupted entries. This problem of identifying groups from a collection of bipartite variables such as proteins and drugs, biological species and gene sequences, malware and signatures, etc is commonly referred to as biclustering or co-clustering. Despite its great practical relevance, and although several ad-hoc methods are available for biclustering, theoretical analysis of the problem is largely non-existent. The problem we consider is also closely related to structured multiple hypothesis testing, an area of statistics that has recently witnessed a flurry of activity. We make the following contributions 1. We prove lower bounds on the minimum signal strength needed for successful recovery of a bicluster as a function of the noise variance, size of the matrix and bicluster of interest. 2. We show that a combinatorial procedure based on the scan statistic achieves this optimal limit. 3. We characterize the SNR required by several computationally tractable procedures for biclustering including element-wise thresholding, column/row average thresholding and a convex relaxation approach to sparse singular vector decomposition.

[1]  Jianhua Z. Huang,et al.  Biclustering via Sparse Singular Value Decomposition , 2010, Biometrics.

[2]  L. Addario-Berry,et al.  On Combinatorial Testing Problems 1 , 2010 .

[3]  Alexandre B. Tsybakov,et al.  Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[4]  Dean P. Foster,et al.  A Formal Statistical Approach to Collaborative Filtering , 1998 .

[5]  Christopher Krügel,et al.  Scalable, Behavior-Based Malware Clustering , 2009, NDSS.

[6]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.

[7]  Wei Wang,et al.  OP-cluster: clustering by tendency in high dimensional space , 2003, Third IEEE International Conference on Data Mining.

[8]  J. Bai,et al.  Inferential Theory for Factor Models of Large Dimensions , 2003 .

[9]  J. Hartigan Direct Clustering of a Data Matrix , 1972 .

[10]  E. Candès,et al.  Searching for a trail of evidence in a maze , 2007, math/0701668.

[11]  Shu Wang,et al.  Biclustering as a method for RNA local multiple sequence alignment , 2007, Bioinform..

[12]  R. Fletcher Semi-Definite Matrix Constraints in Optimization , 1985 .

[13]  L. Lazzeroni Plaid models for gene expression data , 2000 .

[14]  Raj Rao Nadakuditi,et al.  The singular values and vectors of low rank perturbations of large rectangular random matrices , 2011, J. Multivar. Anal..

[15]  M. Wainwright,et al.  High-dimensional analysis of semidefinite relaxations for sparse principal components , 2008, 2008 IEEE International Symposium on Information Theory.

[16]  A. Nobel,et al.  On the maximal size of large-average and ANOVA-fit submatrices in a Gaussian random matrix. , 2010, Bernoulli : official journal of the Bernoulli Society for Mathematical Statistics and Probability.

[17]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[18]  Jianhua Z. Huang,et al.  Sparse principal component analysis via regularized low rank matrix approximation , 2008 .

[19]  Huan Liu,et al.  Subspace clustering for high dimensional data: a review , 2004, SKDD.

[20]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[21]  R. Tibshirani,et al.  A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. , 2009, Biostatistics.

[22]  Michael I. Jordan,et al.  A Direct Formulation for Sparse Pca Using Semidefinite Programming , 2004, NIPS 2004.

[23]  I. Johnstone,et al.  On Consistency and Sparsity for Principal Components Analysis in High Dimensions , 2009, Journal of the American Statistical Association.

[24]  I. Johnstone On the distribution of the largest eigenvalue in principal components analysis , 2001 .

[25]  D. Donoho,et al.  Adaptive multiscale detection of filamentary structures embedded in a background of uniform random points , 2003 .

[26]  Panos M. Pardalos,et al.  Biclustering in data mining , 2008, Comput. Oper. Res..

[27]  G. Stewart Perturbation theory for the singular value decomposition , 1990 .

[28]  S. Szarek,et al.  Chapter 8 - Local Operator Theory, Random Matrices and Banach Spaces , 2001 .

[29]  Xiaoming Huo,et al.  ADAPTIVE MULTISCALE DETECTION OF FILAMENTARY STRUCTURES IN A BACKGROUND OF UNIFORM RANDOM POINTS 1 , 2006 .

[30]  E. Candès,et al.  Detection of an anomalous cluster in a network , 2010, 1001.3209.

[31]  R. Rockafellar The theory of subgradients and its applications to problems of optimization : convex and nonconvex functions , 1981 .

[32]  K. H. Kim The theory of subgradients and its applications to problems of optimization: Convex and nonconvex functions: R.T. Rockafeller, Berlin: Heldermann Verlag, 1981. pp. 107, DM 28.00/$12.00 , 1983 .

[33]  Roded Sharan,et al.  Biclustering Algorithms: A Survey , 2007 .