Predicting Positive and Negative Links with Noisy Queries: Theory & Practice

Social networks involve both positive and negative relationships, which can be captured in signed graphs. The {\em edge sign prediction problem} aims to predict whether an interaction between a pair of nodes will be positive or negative. We provide theoretical results for this problem that motivate natural improvements to recent heuristics. The edge sign prediction problem is related to correlation clustering; a positive relationship means being in the same cluster. We consider the following model for two clusters: we are allowed to query any pair of nodes whether they belong to the same cluster or not, but the answer to the query is corrupted with some probability $0<q<\frac{1}{2}$. Let $\delta=1-2q$ be the bias. We provide an algorithm that recovers all signs correctly with high probability in the presence of noise with $O(\frac{n\log n}{\delta^2}+\frac{\log^2 n}{\delta^6})$ queries. This is the best known result for this problem for all but tiny $\delta$, improving on the recent work of Mazumdar and Saha \cite{mazumdar2017clustering}. We also provide an algorithm that performs $O(\frac{n\log n}{\delta^4})$ queries, and uses breadth first search as its main algorithmic primitive. While both the running time and the number of queries for this algorithm are sub-optimal, our result relies on novel theoretical techniques, and naturally suggests the use of edge-disjoint paths as a feature for predicting signs in online social networks. Correspondingly, we experiment with using edge disjoint $s-t$ paths of short length as a feature for predicting the sign of edge $(s,t)$ in real-world signed networks. Empirical findings suggest that the use of such paths improves the classification accuracy, especially for pairs of nodes with no common neighbors.

[1]  Eli Upfal,et al.  Probability and Computing: Randomized Algorithms and Probabilistic Analysis , 2005 .

[2]  Emmanuel J. Candès,et al.  Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information , 2004, IEEE Transactions on Information Theory.

[3]  Béla Bollobás,et al.  Random Graphs , 1985 .

[4]  F. Harary,et al.  STRUCTURAL BALANCE: A GENERALIZATION OF HEIDER'S THEORY1 , 1977 .

[5]  William E. Butterworth,et al.  The Enemy of My Enemy , 2018 .

[6]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[7]  Bruce E. Hajek,et al.  Achieving exact cluster recovery threshold via semidefinite programming , 2015, 2015 IEEE International Symposium on Information Theory (ISIT).

[8]  Roded Sharan,et al.  Cluster Graph Modification Problems , 2002, WG.

[9]  E. Todeva Networks , 2007 .

[10]  Alan M. Frieze,et al.  Rainbow Connectivity of Sparse Random Graphs , 2012, APPROX-RANDOM.

[11]  van Vu,et al.  A Simple SVD Algorithm for Finding Hidden Partitions , 2014, Combinatorics, Probability and Computing.

[12]  Ryan O'Donnell,et al.  Analysis of Boolean Functions , 2014, ArXiv.

[13]  Jure Leskovec,et al.  Predicting positive and negative links in online social networks , 2010, WWW '10.

[14]  Charalampos E. Tsourakakis Mathematical and Algorithmic Analysis of Network and Biological Data , 2014, ArXiv.

[15]  Jakub W. Pachocki,et al.  Scalable Motif-aware Graph Clustering , 2016, WWW.

[16]  Arya Mazumdar,et al.  Clustering with Noisy Queries , 2017, NIPS.

[17]  Edo Liberty,et al.  Correlation clustering: from theory to practice , 2014, KDD.

[18]  Hector Garcia-Molina,et al.  Entity Resolution with crowd errors , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[19]  Richard Caldwell The Enemy of My Enemy , 2018 .

[20]  Claire Mathieu,et al.  Correlation clustering with noisy input , 2010, SODA '10.

[21]  Nagarajan Natarajan,et al.  Exploiting longer cycles for link prediction in signed networks , 2011, CIKM '11.

[22]  E. David,et al.  Networks, Crowds, and Markets: Reasoning about a Highly Connected World , 2010 .

[23]  Claudio Gentile,et al.  A Correlation Clustering Approach to Link Classification in Signed Networks , 2012, COLT.

[24]  F. Harary On the notion of balance of a signed graph. , 1953 .

[25]  Alan M. Frieze,et al.  Optimal construction of edge-disjoint paths in random graphs , 1994, SODA '94.

[26]  Emmanuel Abbe,et al.  Exact Recovery in the Stochastic Block Model , 2014, IEEE Transactions on Information Theory.

[27]  Yudong Chen,et al.  Clustering Partially Observed Graphs via Convex Optimization , 2011, ICML.

[28]  Jure Leskovec,et al.  Signed networks in social media , 2010, CHI.

[29]  Charalampos E. Tsourakakis Streaming Graph Partitioning in the Planted Partition Model , 2014, COSN.

[30]  Alon Itai,et al.  The complexity of finding maximum disjoint paths with length constraints , 1982, Networks.

[31]  Chris Arney,et al.  Networks, Crowds, and Markets: Reasoning about a Highly Connected World (Easley, D. and Kleinberg, J.; 2010) [Book Review] , 2013, IEEE Technology and Society Magazine.

[32]  Sujay Sanghavi,et al.  Clustering Sparse Graphs , 2012, NIPS.

[33]  Charalampos E. Tsourakakis,et al.  Predicting Signed Edges with O(n log n) Queries , 2016, ArXiv.

[34]  Andrzej Dudek,et al.  Rainbow Connection of Random Regular Graphs , 2013, SIAM J. Discret. Math..

[35]  Frank McSherry,et al.  Spectral partitioning of random graphs , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[36]  F. Heider ATTITUDES AND COGNITIVE ORGANIZATION , 1977 .

[37]  K. E. Read,et al.  Cultures of the Central Highlands, New Guinea , 1954, Southwestern Journal of Anthropology.

[38]  Avrim Blum,et al.  Correlation Clustering , 2004, Machine Learning.

[39]  Charalampos E. Tsourakakis,et al.  Predicting Signed Edges with O(n log n) Queries , 2016, ArXiv.

[40]  Aravindan Vijayaraghavan,et al.  Correlation Clustering with Noisy Partial Information , 2014, COLT.

[41]  Jian Ma,et al.  A new correlation clustering method for cancer mutation analysis , 2016, Bioinform..

[42]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[43]  Arya Mazumdar,et al.  Clustering Via Crowdsourcing , 2016, ArXiv.

[44]  Nagarajan Natarajan,et al.  Prediction and clustering in signed networks: a local to global perspective , 2013, J. Mach. Learn. Res..

[45]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .