Multidimensional partitioning and bi-partitioning

Eigenvectors and, more generally, singular vectors, have proved to be useful tools for data mining and dimension reduction. Spectral clustering and reordering algorithms have been designed and implemented in many disciplines, and they can be motivated from several di(cid:11)erent standpoints. Here we give a general, uni(cid:12)ed, derivation from an applied linear algebra perspective. We use a variational approach that has the bene(cid:12)t of (a) naturally introducing an appropriate scaling, (b) allowing for a solution in any desired dimension, and (c) dealing with both the clustering and bi-clustering issues in the same framework. The motivation and analysis is then backed up with examples involv-ing two large data sets from modern, high-throughput, experimental cell biology. Here, the objects of interest are genes and tissue samples, and the experimental data represents gene activity. We show that looking beyond the dominant, or Fiedler, direction reveals important information.

[1]  Michael C. Hout,et al.  Multidimensional Scaling , 2003, Encyclopedic Dictionary of Archaeology.

[2]  Guillermo Ricardo Simari,et al.  Non-commercial Research and Educational Use including without Limitation Use in Instruction at Your Institution, Sending It to Specific Colleagues That You Know, and Providing a Copy to Your Institution's Administrator. All Other Uses, Reproduction and Distribution, including without Limitation Comm , 2022 .

[3]  Gabriela Kalna,et al.  Spectral analysis of two-signed microarray expression data. , 2007, Mathematical medicine and biology : a journal of the IMA.

[4]  Gabriela Kalna,et al.  Divergent routes to oral cancer. , 2006, Cancer research.

[5]  Sangsoo Kim,et al.  Gene expression Differential coexpression analysis using microarray data and its application to human cancer , 2005 .

[6]  Pablo Tamayo,et al.  Metagenes and molecular pattern discovery using matrix factorization , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Luis Mateus Rocha,et al.  Singular value decomposition and principal component analysis , 2003 .

[8]  Sayan Mukherjee,et al.  An Analytical Method for Multiclass Molecular Cancer Classification , 2003, SIAM Rev..

[9]  P. Grindrod Range-dependent random graphs and their application to modeling large small-world Proteome datasets. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[10]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[11]  Ka Yee Yeung,et al.  Principal component analysis for clustering gene expression data , 2001, Bioinform..

[12]  Chris H. Q. Ding,et al.  A spectral method to separate disconnected and nearly-disconnected web graph components , 2001, KDD '01.

[13]  Inderjit S. Dhillon,et al.  Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.

[14]  U. Alon,et al.  Transcriptional gene expression profiles of colorectal adenoma, adenocarcinoma, and normal tissue examined by oligonucleotide arrays. , 2001, Cancer research.

[15]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[17]  Charles J. Alpert,et al.  Spectral Partitioning: The More Eigenvectors, The Better , 1995, 32nd Design Automation Conference.

[18]  Dirk Roose,et al.  An Improved Spectral Bisection Algorithm and its Application to Dynamic Load Balancing , 1995, EUROSIM International Conference.

[19]  H. D. Simon,et al.  A spectral algorithm for envelope reduction of sparse matrices , 1993, Supercomputing '93. Proceedings.

[20]  L. Goddard Information Theory , 1962, Nature.