SVD, discrepancy, and regular structure of contingency tables

Factors, obtained by correspondence analysis, are used to find biclustering of a contingency table such that the row-column cluster pairs are regular, i.e., they have small discrepancy. In our main theorem, the constant of the so-called volume-regularity is related to the SVD of the normalized contingency table. This result is applicable to two-way cuts when both the rows and columns are divided into the same number of clusters, thus extending partly the result of Butler for estimating the discrepancy of a contingency table by the largest non-trivial singular value of the normalized table (one-cluster, rectangular case), and partly the result of Bolla for estimating the constant of volume-regularity by the structural eigenvalues and the distances of the corresponding eigen-subspaces of the normalized modularity matrix of an edge-weighted graph (several clusters, symmetric case).

[1]  John Scott,et al.  Using Correspondence Analysis for Joint Displays of Affiliation Networks , 2005 .

[2]  Alan M. Frieze,et al.  Quick Approximation to Matrices and Applications , 1999, Comb..

[3]  Marianna Bolla Spectra and structure of weighted graphs , 2011, Electron. Notes Discret. Math..

[4]  V. Sós,et al.  Convergent Sequences of Dense Graphs I: Subgraph Frequencies, Metric Properties and Testing , 2007, math/0702004.

[5]  Marianna Bolla,et al.  Spectra and optimal partitions of weighted graphs , 1994, Discret. Math..

[6]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[7]  Jianhua Z. Huang,et al.  Biclustering via Sparse Singular Value Decomposition , 2010, Biometrics.

[8]  Inderjit S. Dhillon,et al.  Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.

[9]  Marianna Bolla,et al.  Singular value decomposition of large random matrices (for two-way classification of microarrays) , 2008, J. Multivar. Anal..

[10]  R. Bhatia Matrix Analysis , 1996 .

[11]  Nathan Linial,et al.  Lifts, Discrepancy and Nearly Optimal Spectral Gap* , 2006, Comb..

[12]  N. Linial,et al.  Lifts, Discrepancy and Nearly Optimal Spectral Gaps , 2003 .

[13]  Marianna Bolla,et al.  Modularity spectra, eigen-subspaces, and structure of weighted graphs , 2013, Eur. J. Comb..

[14]  Desmond J. Higham,et al.  ANALYSIS OF THE SINGULAR VALUE DECOMPOSITION AS A TOOL FOR PROCESSING MICROARRAY EXPRESSION DATA , 2005 .

[15]  Joseph T. Chang,et al.  Spectral biclustering of microarray data: coclustering genes and conditions. , 2003, Genome research.

[16]  László Lovász,et al.  Limits of dense graph sequences , 2004, J. Comb. Theory B.

[17]  Hong Yan,et al.  Biclustering of Microarray Data Based on Singular Value Decomposition , 2007, PAKDD Workshops.

[18]  Béla Bollobás,et al.  Hermitian matrices and graphs: singular values and discrepancy , 2004, Discret. Math..

[19]  C. Radhakrishna Rao,et al.  Separation theorems for singular values of matrices and their applications in multivariate analysis , 1979 .

[20]  S. Butler Using discrepancy to control singular values for nonnegative matrices , 2006 .

[21]  Alan M. Frieze,et al.  Clustering Large Graphs via the Singular Value Decomposition , 2004, Machine Learning.

[22]  Panos M. Pardalos,et al.  Biclustering in data mining , 2008, Comput. Oper. Res..

[23]  R. Clarke,et al.  Theory and Applications of Correspondence Analysis , 1985 .

[24]  J. Hartigan Direct Clustering of a Data Matrix , 1972 .

[25]  Li Liu,et al.  Robust singular value decomposition analysis of microarray data , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[26]  Alan M. Frieze,et al.  Fast monte-carlo algorithms for finding low-rank approximations , 2004, JACM.