The Probabilistic Hitting Set Paradigm: a General Framework for Search and Detection in Dynamic Social Networks

We formulate and study the Probabilistic Hitting Set Paradigm (PHSP), a general framework for design and analysis of search and detection algorithms in large scale dynamic networks. The PHSP captures applications ranging from monitoring new contents on the web, blogosphere, and Twitterverse, to analyzing influence properties in social networks, and detecting failure propagation on large electronic circuits. The Probabilistic Hitting Set Paradigm (PHSP) defines an infinite time generating process that places new items in subsets of nodes, according to an unknown probability distribution that may change in time. The freshness or relevance of the items decay exponentially in time, and the goal is to compute a dynamic probing schedule that probes one or a few nodes per step and maximizes the expected sum of the relevance of the items that are discovered at each step. We develop an efficient sampling method for estimating the network parameters and an efficient optimization algorithm for obtaining an optimal probing schedule. We also present a scalable solution on the MapReduce platform. Finally we apply our method to real social networks, demonstrating the practicality and optimality of our solution.

[1]  Jure Leskovec,et al.  Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters , 2008, Internet Math..

[2]  Christos Faloutsos,et al.  oddball: Spotting Anomalies in Weighted Graphs , 2010, PAKDD.

[3]  N. Christakis,et al.  Social Network Sensors for Early Detection of Contagious Outbreaks , 2010, PloS one.

[4]  Jure Leskovec,et al.  Friendship and mobility: user movement in location-based social networks , 2011, KDD.

[5]  Anirban Dasgupta,et al.  The discoverability of the web , 2007, WWW '07.

[6]  Hector Garcia-Molina,et al.  Web graph similarity for anomaly detection , 2010, Journal of Internet Services and Applications.

[7]  Andreas Krause,et al.  Simultaneous placement and scheduling of sensors , 2009, 2009 International Conference on Information Processing in Sensor Networks.

[8]  H. R. Pitt Divergent Series , 1951, Nature.

[9]  Louiqa Raschid,et al.  Adaptive pull-based policies for wide area data delivery , 2006, TODS.

[10]  Andreas Krause,et al.  Robust sensor placements at informative and communication-efficient locations , 2011, TOSN.

[11]  Philip S. Yu,et al.  Optimal crawling strategies for web search engines , 2002, WWW '02.

[12]  Wei Chen,et al.  Scalable influence maximization for prevalent viral marketing in large-scale social networks , 2010, KDD.

[13]  Ambuj K. Singh,et al.  NetSpot: Spotting Significant Anomalous Regions on Dynamic Networks , 2013, SDM.

[14]  Azadeh Iranmehr,et al.  Trust Management for Semantic Web , 2009, 2009 Second International Conference on Computer and Electrical Engineering.

[15]  Hao Yuan,et al.  Controlling Infection by Blocking Nodes and Links Simultaneously , 2011, WINE.

[16]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[17]  Masahiro Kimura,et al.  Minimizing the Spread of Contamination by Blocking Links in a Network , 2008, AAAI.

[18]  Nick Koudas,et al.  TwitterMonitor: trend detection over the twitter stream , 2010, SIGMOD Conference.

[19]  James Aspnes,et al.  Inoculation strategies for victims of viruses and the sum-of-squares partition problem , 2005, SODA '05.

[20]  Éva Tardos,et al.  Maximizing the Spread of Influence through a Social Network , 2015, Theory Comput..

[21]  Masahiro Kimura,et al.  Blocking links to minimize contamination spread in a social network , 2009, TKDD.

[22]  Basem Shihada,et al.  Towards Optimal Event Detection and Localization in Acyclic Flow Networks , 2012, ICDCN.

[23]  Avi Ostfeld,et al.  The Battle of the Water Sensor Networks (BWSN): A Design Challenge for Engineers and Algorithms , 2008 .

[24]  Angsheng Li,et al.  The Complexity and Approximability of Minimum Contamination Problems , 2011, TAMC.

[25]  Andreas Krause,et al.  Online distributed sensor selection , 2010, IPSN '10.

[26]  W. Hart,et al.  Review of Sensor Placement Strategies for Contamination Warning Systems in Drinking Water Distribution Systems , 2010 .

[27]  Andreas Krause,et al.  Cost-effective outbreak detection in networks , 2007, KDD '07.

[28]  Samarth Swarup,et al.  Blocking Simple and Complex Contagion by Edge Removal , 2013, 2013 IEEE 13th International Conference on Data Mining.

[29]  Éva Tardos,et al.  Influential Nodes in a Diffusion Model for Social Networks , 2005, ICALP.

[30]  Rich Caruana,et al.  Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Jose, California, USA, August 12-15, 2007 , 2007, KDD.

[31]  Yiming Yang,et al.  Introducing the Enron Corpus , 2004, CEAS.

[32]  Mario Cataldi,et al.  Emerging topic detection on Twitter based on temporal and social terms evaluation , 2010, MDMKDD '10.

[33]  G. Mitra,et al.  The handbook of news analytics in finance , 2011 .

[34]  Noam Lemelshtrich Latar The Robot Journalist in the Age of Social Physics: The End of Human Journalism? , 2015 .

[35]  Andreas Krause,et al.  Submodularity and its applications in optimized information gathering , 2011, TIST.

[36]  Jure Leskovec,et al.  Learning to Discover Social Circles in Ego Networks , 2012, NIPS.

[37]  Andreas Krause,et al.  Efficient Sensor Placement Optimization for Securing Large Water Distribution Networks , 2008 .

[38]  Masahiro Kimura,et al.  Solving the Contamination Minimization Problem on Networks for the Linear Threshold Model , 2008, PRICAI.

[39]  Kyomin Jung,et al.  IRIE: Scalable and Robust Influence Maximization in Social Networks , 2011, 2012 IEEE 12th International Conference on Data Mining.

[40]  Philip S. Yu,et al.  GraphScope: parameter-free mining of large time-evolving graphs , 2007, KDD '07.

[41]  Diane J. Cook,et al.  Graph-based anomaly detection , 2003, KDD '03.

[42]  Xiaokui Xiao,et al.  Influence maximization: near-optimal time complexity meets practical efficiency , 2014, SIGMOD Conference.

[43]  Lawrence B. Holder,et al.  Discovering Structural Anomalies in Graph-Based Data , 2007, Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007).

[44]  Wei Chen,et al.  Efficient influence maximization in social networks , 2009, KDD.