Mixed Membership Graph Clustering via Systematic Edge Query

This work considers clustering nodes of a largely incomplete graph. Under the problem setting, only a small amount of queries about the edges can be made, but the entire graph is not observable. This problem finds applications in large-scale data clustering using limited annotations, community detection under restricted survey resources, and graph topology inference under hidden/removed node interactions. Prior works tackled this problem from various perspectives, e.g., convex programming-based low-rank matrix completion and active query-based clique finding. Nonetheless, many existing methods are designed for estimating the single-cluster membership of the nodes, but nodes may often have mixed (i.e., multi-cluster) membership in practice. Some query and computational paradigms, e.g., the random query patterns and nuclear norm-based optimization advocated in the convex approaches, may give rise to scalability and implementation challenges. This work aims at learning mixed membership of nodes using queried edges. The proposed method is developed together with a systematic query principle that can be controlled and adjusted by the system designers to accommodate implementation challenges—e.g., to avoid querying edges that are physically hard to acquire. Our framework also features a lightweight and scalable algorithm with membership learning guarantees. Real-data experiments on crowdclustering and community detection are used to showcase the effectiveness of our method.

[1]  Yang Song,et al.  An Overview of Microsoft Academic Service (MAS) and Applications , 2015, WWW.

[2]  Michael Ley,et al.  The DBLP Computer Science Bibliography: Evolution, Research Issues, Perspectives , 2002, SPIRE.

[3]  Shuicheng Yan,et al.  Correntropy Induced L2 Graph for Robust Subspace Clustering , 2013, 2013 IEEE International Conference on Computer Vision.

[4]  Babak Hassibi,et al.  Finding Dense Clusters via "Low Rank + Sparse" Decomposition , 2011, ArXiv.

[5]  Pili Hu,et al.  A Survey and Taxonomy of Graph Sampling , 2013, ArXiv.

[6]  Boleslaw K. Szymanski,et al.  Overlapping community detection in networks: The state-of-the-art and comparative study , 2011, CSUR.

[7]  A. Rinaldo,et al.  Consistency of spectral clustering in stochastic block models , 2013, 1312.2050.

[8]  P. Wedin Perturbation theory for pseudo-inverses , 1973 .

[9]  Babak Hassibi,et al.  Crowdsourced Clustering: Querying Edges vs Triangles , 2016, NIPS.

[10]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[11]  Babak Hassibi,et al.  Graph Clustering With Missing Data: Convex Algorithms and Analysis , 2014, NIPS.

[12]  Tengyao Wang,et al.  A useful variant of the Davis--Kahan theorem for statisticians , 2014, 1405.0680.

[13]  Anima Anandkumar,et al.  A tensor approach to learning mixed membership community models , 2013, J. Mach. Learn. Res..

[14]  Fei-Fei Li,et al.  Novel Dataset for Fine-Grained Image Categorization : Stanford Dogs , 2012 .

[15]  Edoardo M. Airoldi,et al.  Mixed Membership Stochastic Blockmodels , 2007, NIPS.

[16]  Yudong Chen,et al.  Clustering Partially Observed Graphs via Convex Optimization , 2011, ICML.

[17]  Purnamrita Sarkar,et al.  Estimating Mixed Memberships With Sharp Eigenvector Deviations , 2017, Journal of the American Statistical Association.

[18]  Bin Yu,et al.  Spectral clustering and the high-dimensional stochastic blockmodel , 2010, 1007.1684.

[19]  Wing-Kin Ma,et al.  Robustness Analysis of Structured Matrix Factorization via Self-Dictionary Mixed-Norm Optimization , 2016, IEEE Signal Processing Letters.

[20]  M. O'Neal,et al.  Survey of Soybean Insect Pollinators: Community Identification and Sampling Method Analysis , 2015, Environmental entomology.

[21]  Purnamrita Sarkar,et al.  On Mixed Memberships and Symmetric Nonnegative Matrix Factorizations , 2016, ICML.

[22]  Yuan Zhang,et al.  Detecting Overlapping Communities in Networks Using Spectral Methods , 2014, SIAM J. Math. Data Sci..

[23]  Nicolas Gillis,et al.  Successive Nonnegative Projection Algorithm for Robust Nonnegative Blind Source Separation , 2013, SIAM J. Imaging Sci..

[24]  Stijn van Dongen,et al.  Graph Clustering Via a Discrete Uncoupling Process , 2008, SIAM J. Matrix Anal. Appl..

[25]  Charles R. Johnson,et al.  Matrix Analysis, 2nd Ed , 2012 .

[26]  Arya Mazumdar,et al.  Clustering with Noisy Queries , 2017, NIPS.

[27]  Chong-Yung Chi,et al.  A Convex Analysis-Based Minimum-Volume Enclosing Simplex Algorithm for Hyperspectral Unmixing , 2009, IEEE Transactions on Signal Processing.

[28]  Christos Faloutsos,et al.  Sampling from large graphs , 2006, KDD '06.

[29]  Pietro Perona,et al.  Crowdclustering , 2011, NIPS.

[30]  Nikos D. Sidiropoulos,et al.  Blind Separation of Quasi-Stationary Sources: Exploiting Convex Geometry in Covariance Domain , 2015, IEEE Transactions on Signal Processing.

[31]  Xiao Fu,et al.  Detecting Overlapping and Correlated Communities without Pure Nodes: Identifiability and Algorithm , 2019, ICML.

[32]  Arya Mazumdar,et al.  Clustering Via Crowdsourcing , 2016, ArXiv.

[33]  Victoria Stodden,et al.  When Does Non-Negative Matrix Factorization Give a Correct Decomposition into Parts? , 2003, NIPS.

[34]  Nicolas Gillis,et al.  Fast and Robust Recursive Algorithmsfor Separable Nonnegative Matrix Factorization , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Mohsen Malekinejad,et al.  Using Respondent-Driven Sampling Methodology for HIV Biological and Behavioral Surveillance in International Settings: A Systematic Review , 2008, AIDS and Behavior.

[36]  Wing-Kin Ma,et al.  Nonnegative Matrix Factorization for Signal and Data Analytics: Identifiability, Algorithms, and Applications , 2018, IEEE Signal Processing Magazine.

[37]  José M. Bioucas-Dias,et al.  Self-Dictionary Sparse Regression for Hyperspectral Unmixing: Greedy Pursuit and Pure Pixel Search Are Related , 2014, IEEE Journal of Selected Topics in Signal Processing.

[38]  Nikos D. Sidiropoulos,et al.  Non-Negative Matrix Factorization Revisited: Uniqueness and Algorithm for Symmetric Decomposition , 2014, IEEE Transactions on Signal Processing.

[39]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[40]  Nikos D. Sidiropoulos,et al.  Parallel Randomly Compressed Cubes : A scalable distributed architecture for big tensor decomposition , 2014, IEEE Signal Processing Magazine.

[41]  Bo Yang,et al.  Robust Volume Minimization-Based Matrix Factorization for Remote Sensing and Document Clustering , 2016, IEEE Transactions on Signal Processing.

[42]  Vikas Sindhwani,et al.  Fast Conical Hull Algorithms for Near-separable Non-negative Matrix Factorization , 2012, ICML.

[43]  Kejun Huang,et al.  Crowdsourcing via Pairwise Co-occurrences: Identifiability and Algorithms , 2019, NeurIPS.

[44]  Xiao Fu,et al.  Learning Mixed Membership from Adjacency Graph Via Systematic Edge Query: Identifiability and Algorithm , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[45]  John N. Tsitsiklis,et al.  Blind identification of stochastic block models from dynamical observations , 2019, SIAM J. Math. Data Sci..

[46]  Nikos D. Sidiropoulos,et al.  Anchor-Free Correlated Topic Modeling: Identifiability and Algorithm , 2016, NIPS.

[47]  Tom A. B. Snijders,et al.  Social Network Analysis , 2011, International Encyclopedia of Statistical Science.

[48]  Anastasios Kyrillidis,et al.  Multi-Way Compressed Sensing for Sparse Low-Rank Tensors , 2012, IEEE Signal Processing Letters.

[49]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[50]  Maxim Panov,et al.  Consistent Estimation of Mixed Memberships with Successive Projections , 2017, COMPLEX NETWORKS.

[51]  Florent Krzakala,et al.  Clustering from sparse pairwise measurements , 2016, 2016 IEEE International Symposium on Information Theory (ISIT).

[52]  R. Jonker,et al.  Improving the Hungarian assignment algorithm , 1986 .

[53]  Ramya Korlakai Vinayak Graph Clustering: Algorithms, Analysis and Query Design , 2018 .

[54]  Tomohiko Mizutani,et al.  Ellipsoidal rounding for nonnegative matrix factorization under noisy separability , 2013, J. Mach. Learn. Res..

[55]  Franz J. Király,et al.  The algebraic combinatorial approach for low-rank matrix completion , 2012, J. Mach. Learn. Res..

[56]  Xiao Fu,et al.  On Identifiability of Nonnegative Matrix Factorization , 2017, IEEE Signal Processing Letters.

[57]  Jinfeng Yi,et al.  Semi-Crowdsourced Clustering: Generalizing Crowd Labeling by Robust Distance Metric Learning , 2012, NIPS.

[58]  Stephen Roberts,et al.  Overlapping community detection using Bayesian non-negative matrix factorization. , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[59]  Jinfeng Yi,et al.  Crowdclustering with Sparse Pairwise Labels: A Matrix Completion Approach , 2012, HCOMP@AAAI.

[60]  Padhraic Smyth,et al.  A Spectral Clustering Approach To Finding Communities in Graph , 2005, SDM.

[61]  Maya R. Gupta,et al.  Introduction to the Dirichlet Distribution and Related Processes , 2010 .

[62]  Nigel Boston,et al.  A characterization of deterministic sampling patterns for low-rank matrix completion , 2015, 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[63]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[64]  Charu C. Aggarwal,et al.  Graph Clustering , 2010, Encyclopedia of Machine Learning and Data Mining.

[65]  Cheng Gao,et al.  Block-Randomized Stochastic Proximal Gradient for Low-Rank Tensor Factorization , 2019, IEEE Transactions on Signal Processing.

[66]  Chris H. Q. Ding,et al.  Symmetric Nonnegative Matrix Factorization for Graph Clustering , 2012, SDM.