Unsupervised robust nonparametric learning of hidden community properties

We consider learning of fundamental properties of communities in large noisy networks, in the prototypical situation where the nodes or users are split into two classes according to a binary property, e.g., according to their opinions or preferences on a topic. For learning these properties, we propose a nonparametric, unsupervised, and scalable graph scan procedure that is, in addition, robust against a class of powerful adversaries. In our setup, one of the communities can fall under the influence of a knowledgeable adversarial leader, who knows the full network structure, has unlimited computational resources and can completely foresee our planned actions on the network. We prove strong consistency of our results in this setup with minimal assumptions. In particular, the learning procedure estimates the baseline activity of normal users asymptotically correctly with probability 1; the only assumption being the existence of a single implicit community of asymptotically negligible logarithmic size. We provide experiments on real and synthetic data to illustrate the performance of our method, including examples with adversaries.

[1]  J. Uspensky,et al.  Introduction to Mathematical Probability , 1938, Nature.

[2]  H. Poincaré,et al.  Percolation ? , 1982 .

[3]  Donald Geman,et al.  An Active Testing Model for Tracking Roads in Satellite Images , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Demetri Terzopoulos,et al.  Deformable models in medical image analysis: a survey , 1996, Medical Image Anal..

[5]  M. Kulldorff A spatial scan statistic , 1997 .

[6]  M. Kulldorff Spatial Scan Statistics: Models, Calculations, and Applications , 1999 .

[7]  G. P. Patil,et al.  Upper level set scan statistic for detecting arbitrarily shaped hotspots , 2004, Environmental and Ecological Statistics.

[8]  L. Rotz,et al.  Advances in detecting and responding to threats from bioterrorism and emerging infectious disease , 2004, Nature Medicine.

[9]  T. Tango,et al.  International Journal of Health Geographics a Flexibly Shaped Spatial Scan Statistic for Detecting Clusters , 2005 .

[10]  Lada A. Adamic,et al.  The political blogosphere and the 2004 U.S. election: divided they blog , 2005, LinkKDD '05.

[11]  Andrew W. Moore,et al.  A Bayesian Spatial Scan Statistic , 2005, NIPS.

[12]  M. Kulldorff,et al.  An elliptic spatial scan statistic , 2006, Statistics in medicine.

[13]  Svetha Venkatesh,et al.  Efficient algorithms for subwindow search in object detection and localization , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Zhe Jiang,et al.  Spatial Statistics , 2013 .

[15]  Olaf Wittich,et al.  Robust nonparametric detection of objects in noisy images , 2010 .

[16]  Mikhail A. Langovoy,et al.  Adaptive nonparametric detection in cryo-electron microscopy , 2011, 1311.7650.

[17]  Olaf Wittich,et al.  Randomized algorithms for statistical image analysis and site percolation on square lattices , 2011 .

[18]  Michael Habeck,et al.  Spatial statistics, image analysis and percolation theory , 2011 .

[19]  Olaf Wittich,et al.  Detection of objects in noisy images and site percolation on square lattices , 2011, ArXiv.

[20]  Alessandro Rinaldo,et al.  Changepoint Detection over Graphs with the Spectral Scan Statistic , 2012, AISTATS.

[21]  Akshay Krishnamurthy,et al.  Near-optimal Anomaly Detection in Graphs using Lovasz Extended Scan Statistic , 2013, NIPS.

[22]  Srinivasan Parthasarathy,et al.  Efficient community detection in large networks using content and links , 2012, WWW.

[23]  G. Grimmett,et al.  Cluster detection in networks using percolation , 2011, 1104.0338.

[24]  Jure Leskovec,et al.  Community Detection in Networks with Node Attributes , 2013, 2013 IEEE 13th International Conference on Data Mining.

[25]  Daniel B. Neill,et al.  Non-parametric scan statistics for event detection and forecasting in heterogeneous social media graphs , 2014, KDD.

[26]  Yang Feng,et al.  Voting with Feet: Who are Leaving Hillary Clinton and Donald Trump , 2016, 2016 IEEE International Symposium on Multimedia (ISM).

[27]  Yu Liu,et al.  Graph Topic Scan Statistic for Spatial Event Detection , 2016, CIKM.

[28]  F. Aarestrup,et al.  Sharing Data for Global Infectious Disease Surveillance and Outbreak Detection. , 2016, Trends in microbiology.

[29]  Yingshu Li,et al.  Influence analysis: A survey of the state-of-the-art , 2018, Math. Found. Comput..

[30]  Josien P. W. Pluim,et al.  Not‐so‐supervised: A survey of semi‐supervised, multi‐instance, and transfer learning in medical image analysis , 2018, Medical Image Anal..

[31]  Baojun Zhao,et al.  High-Performance Visual Tracking With Extreme Learning Machine Framework , 2020, IEEE Transactions on Cybernetics.

[32]  Tsuyoshi Murata,et al.  {m , 1934, ACML.