Hierarchical Information Clustering by Means of Topologically Embedded Graphs

We introduce a graph-theoretic approach to extract clusters and hierarchies in complex data-sets in an unsupervised and deterministic manner, without the use of any prior information. This is achieved by building topologically embedded networks containing the subset of most significant links and analyzing the network structure. For a planar embedding, this method provides both the intra-cluster hierarchy, which describes the way clusters are composed, and the inter-cluster hierarchy which describes how clusters gather together. We discuss performance, robustness and reliability of this method by first investigating several artificial data-sets, finding that it can outperform significantly other established approaches. Then we show that our method can successfully differentiate meaningful clusters and hierarchies in a variety of real data-sets. In particular, we find that the application to gene expression patterns of lymphoma samples uncovers biologically significant groups of genes which play key-roles in diagnosis, prognosis and treatment of some of the most relevant human lymphoid malignancies.

[1]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[2]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[4]  Richard A. Ashmun,et al.  Tumor Suppression at the Mouse INK4a Locus Mediated by the Alternative Reading Frame Product p19 ARF , 1997, Cell.

[5]  José S. Andrade,et al.  Erratum: Apollonian Networks: Simultaneously Scale-Free, Small World, Euclidean, Space Filling, and with Matching Graphs [Phys. Rev. Lett. 94 , 018702 (2005)] , 2009 .

[6]  Thomas S. Lin,et al.  Targeting CD37-positive lymphoid malignancies with a novel engineered small modular immunopharmaceutical. , 2007, Blood.

[7]  David Botstein,et al.  Transformation of follicular lymphoma to diffuse large-cell lymphoma: Alternative patterns with increased or decreased expression of c-myc and its regulated genes , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Douglas B. Kell,et al.  Computational cluster validation in post-genomic data analysis , 2005, Bioinform..

[9]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Junbai Wang,et al.  Clustering of the SOM easily reveals distinct gene expression patterns: results of a reanalysis of lymphoma study , 2002, BMC Bioinformatics.

[11]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[12]  T. Aste,et al.  The use of dynamical networks to detect the hierarchical organization of financial market sectors , 2010 .

[13]  Ryan R Brinkman,et al.  Diffuse large B-cell lymphoma: reduced CD20 expression is associated with an inferior survival. , 2009, Blood.

[14]  M Tumminello,et al.  A tool for filtering information in complex systems. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Donald B. Johnson,et al.  Efficient Algorithms for Shortest Paths in Sparse Networks , 1977, J. ACM.

[16]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[17]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[18]  Reinhard Diestel,et al.  Graph Theory , 1997 .

[19]  A. Barabasi,et al.  The human disease network , 2007, Proceedings of the National Academy of Sciences.

[20]  Joaquín Dopazo,et al.  Papers on normalization, variable selection, classification or clustering of microarray data , 2009, Bioinform..

[21]  Massimo Riccaboni,et al.  Scale-free models for the structure of business firm networks. , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[22]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[23]  André A Moreira,et al.  Biased percolation on scale-free networks. , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[24]  M. Arsura,et al.  TGFβ1 Inhibits NF-κB/Rel Activity Inducing Apoptosis of B Cells: Transcriptional Activation of IκBα , 1996 .

[25]  Raymond Liu,et al.  Engagement of CD81 induces ezrin tyrosine phosphorylation and its cellular redistribution with filamentous actin , 2009, Journal of Cell Science.

[26]  Harry Eugene Stanley,et al.  Catastrophic cascade of failures in interdependent networks , 2009, Nature.

[27]  M. DePamphilis,et al.  HUMAN DISEASE , 1957, The Ulster Medical Journal.

[28]  Catia Pesquita,et al.  Metrics for GO based protein semantic similarity: a systematic evaluation , 2008, BMC Bioinformatics.

[29]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[30]  L. Staudt,et al.  Molecular subtypes of diffuse large B-cell lymphoma arise by distinct genetic pathways , 2008, Proceedings of the National Academy of Sciences.

[31]  Stefano Monti,et al.  SYK-dependent tonic B-cell receptor signaling is a rational treatment target in diffuse large B-cell lymphoma. , 2008, Blood.

[32]  Alexander Schliep,et al.  Clustering cancer gene expression data: a comparative study , 2008, BMC Bioinformatics.

[33]  Partha S. Vasisht Computational Analysis of Microarray Data , 2003 .

[34]  T. Di Matteo,et al.  Complex networks on hyperbolic surfaces , 2004, cond-mat/0408443.

[35]  William H. E. Day,et al.  COMPLEXITY THEORY: AN INTRODUCTION FOR PRACTITIONERS OF CLASSIFICATION , 1996 .

[36]  Paul A. Bates,et al.  Cluster analysis of networks generated through homology: automatic identification of important protein communities involved in cancer metastasis , 2006, BMC Bioinformatics.

[37]  Weixiong Zhang,et al.  A general co-expression network-based approach to gene expression analysis: comparison and applications , 2010, BMC Systems Biology.

[38]  T. Aste,et al.  Innovation flow through social networks: productivity distribution in France and Italy , 2004 .

[39]  G. Ringel Map Color Theorem , 1974 .

[40]  Carla Perrone-Capano,et al.  Activity-dependent neural network model on scale-free networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[41]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[42]  Guido Caldarelli,et al.  Scale-Free Networks , 2007 .

[43]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[44]  Tiziana di Matteo,et al.  Nested hierarchies in planar graphs , 2009, Discret. Appl. Math..

[45]  H E Stanley,et al.  Classes of small-world networks. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[46]  J. Dunn Well-Separated Clusters and Optimal Fuzzy Partitions , 1974 .

[47]  G. Cattoretti,et al.  Constitutively activated STAT3 promotes cell proliferation and survival in the activated B-cell subtype of diffuse large B-cell lymphomas. , 2007, Blood.

[48]  U. Jaeger,et al.  Cyclin D3 is a predictive and prognostic factor in diffuse large B-cell lymphoma. , 2002, Clinical cancer research : an official journal of the American Association for Cancer Research.

[49]  Generating Random Vectors from the Multivariate Normal Distribution , 1998 .

[50]  David R. Wood,et al.  On the Maximum Number of Cliques in a Graph , 2006, Graphs Comb..

[51]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[52]  R. Gartenhaus,et al.  Phospho-p70S6K and cdc2/cdk1 as therapeutic targets for diffuse large B-cell lymphoma , 2009, Expert opinion on therapeutic targets.

[53]  M. Shipp,et al.  Advances in the biology and therapy of diffuse large B-cell lymphoma: moving toward a molecularly targeted approach. , 2005, Blood.

[54]  D. Garlaschelli,et al.  Self-organized network evolution coupled to extremal dynamics , 2006, cond-mat/0611201.

[55]  Joel S. Bader,et al.  NeMo: Network Module identification in Cytoscape , 2010, BMC Bioinformatics.

[56]  M. Hino,et al.  Change of CD20 Expression in Diffuse Large B-Cell Lymphoma Treated with Rituximab, an Anti-CD20 Monoclonal Antibody: A Study of the Osaka Lymphoma Study Group , 2009, Case Reports in Oncology.

[57]  C. Durniak,et al.  Soliton interaction in a complex plasma. , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[58]  G. Romeo,et al.  IRF-1 as a negative regulator of cell proliferation. , 2002, Journal of interferon & cytokine research : the official journal of the International Society for Interferon and Cytokine Research.

[59]  S. Fortunato,et al.  Resolution limit in community detection , 2006, Proceedings of the National Academy of Sciences.

[60]  J. S. Andrade,et al.  Apollonian networks: simultaneously scale-free, small world, euclidean, space filling, and with matching graphs. , 2004, Physical review letters.

[61]  T E Browder,et al.  Observation of B+-->K1(1270)+gamma. , 2005, Physical review letters.

[62]  T. Sørensen,et al.  A method of establishing group of equal amplitude in plant sociobiology based on similarity of species content and its application to analyses of the vegetation on Danish commons , 1948 .

[63]  Tomaso Aste,et al.  Exchanges in complex networks: income and wealth distributions , 2003 .

[64]  G. Laurent,et al.  Syk-dependent mTOR activation in follicular lymphoma cells. , 2006, Blood.

[65]  H. Koga,et al.  Prognostic significance of the F‐box protein Skp2 expression in diffuse large B‐cell lymphoma , 2003, American journal of hematology.

[66]  L. Staudt,et al.  Cooperative signaling through the signal transducer and activator of transcription 3 and nuclear factor-{kappa}B pathways in subtypes of diffuse large B-cell lymphoma. , 2008, Blood.

[67]  Ramón Bosch,et al.  Building an outcome predictor model for diffuse large B-cell lymphoma. , 2004, The American journal of pathology.