Watersheds on Hypergraphs for Data Clustering

We present a novel extension of watershed cuts to hypergraphs, allowing the clustering of data represented as an hypergraph, in the context of data sciences. Contrarily to the methods in the literature, instances of data are not represented as nodes, but as edges of the hypergraph. The properties associated with each instance are used to define nodes and feature vectors associated to the edges. This rich representation is unexplored and leads to a data clustering algorithm that considers the induced topology and data similarity concomitantly. We illustrate the capabilities of our method considering a dataset of movies, demonstrating that knowledge from mathematical morphology can be used beyond image processing, for the visual analytics of network data. More results, the data, and the source code used in this work are available at https://github.com/015988/hypershed.

[1]  Edward M. Reingold,et al.  Graph drawing by force‐directed placement , 1991, Softw. Pract. Exp..

[2]  Alain Bretto,et al.  Random walks in directed hypergraphs and application to semi-supervised image segmentation , 2014, Comput. Vis. Image Underst..

[3]  Charu C. Aggarwal,et al.  Graph Clustering , 2010, Encyclopedia of Machine Learning and Data Mining.

[4]  Sara Elena Garza Villarreal,et al.  Local bilateral clustering for identifying research topics and groups from bibliographical data , 2015, Knowledge and Information Systems.

[5]  Marcello Pelillo,et al.  A Game-Theoretic Approach to Hypergraph Clustering , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[7]  S. Beucher,et al.  Morphological segmentation , 1990, J. Vis. Commun. Image Represent..

[8]  Luc Vincent,et al.  Watersheds in Digital Spaces: An Efficient Algorithm Based on Immersion Simulations , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Camille Couprie,et al.  Power Watershed: A Unifying Graph-Based Optimization Framework , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Bernhard Schölkopf,et al.  Learning with Hypergraphs: Clustering, Classification, and Embedding , 2006, NIPS.

[11]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[12]  Hong Cheng,et al.  Graph Clustering Based on Structural/Attribute Similarities , 2009, Proc. VLDB Endow..

[13]  Joydeep Ghosh,et al.  Data Clustering Algorithms And Applications , 2013 .

[14]  Cristian Sminchisescu,et al.  Efficient Hypergraph Clustering , 2012, AISTATS.

[15]  Satu Elisa Schaeffer,et al.  Graph Clustering , 2017, Encyclopedia of Machine Learning and Data Mining.

[16]  Gilles Bertrand,et al.  Collapses and Watersheds in Pseudomanifolds of Arbitrary Dimension , 2014, Journal of Mathematical Imaging and Vision.

[17]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[18]  Gilles Bertrand,et al.  Quasi-Linear Algorithms for the Topological Watershed , 2005, Journal of Mathematical Imaging and Vision.

[19]  Alain Bretto,et al.  Hypergraph-Based Image Representation , 2005, GbRPR.

[20]  Foad Lotfifar,et al.  A Serial Multilevel Hypergraph Partitioning Algorithm , 2016, ArXiv.

[21]  Gilles Bertrand,et al.  Enhanced computation method of topological smoothing on shared memory parallel machines , 2005, Journal of Mathematical Imaging and Vision.

[22]  Fernand Meyer Watersheds on weighted graphs , 2014, Pattern Recognit. Lett..

[23]  Jos B. T. M. Roerdink,et al.  The Watershed Transform: Definitions, Algorithms and Parallelization Strategies , 2000, Fundam. Informaticae.

[24]  Nicolas Passat,et al.  Watershed and multimodal data for brain vessel segmentation: Application to the superior sagittal sinus , 2007, Image Vis. Comput..

[25]  Gilles Bertrand,et al.  Watershed Cuts: Minimum Spanning Forests and the Drop of Water Principle , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.