Stop-and-Stare: Optimal Sampling Algorithms for Viral Marketing in Billion-scale Networks

Influence Maximization (IM), that seeks a small set of key users who spread the influence widely into the network, is a core problem in multiple domains. It finds applications in viral marketing, epidemic control, and assessing cascading failures within complex systems. Despite the huge amount of effort, IM in billion-scale networks such as Facebook, Twitter, and World Wide Web has not been satisfactorily solved. Even the state-of-the-art methods such as TIM+ and IMM may take days on those networks. In this paper, we propose SSA and D-SSA, two novel sampling frameworks for IM-based viral marketing problems. SSA and D-SSA are up to 1200 times faster than the SIGMOD'15 best method, IMM, while providing the same (1-1/e-ε) approximation guarantee. Underlying our frameworks is an innovative Stop-and-Stare strategy in which they stop at exponential check points to verify (stare) if there is adequate statistical evidence on the solution quality. Theoretically, we prove that SSA and D-SSA are the first approximation algorithms that use (asymptotically) minimum numbers of samples, meeting strict theoretical thresholds characterized for IM. The absolute superiority of SSA and D-SSA are confirmed through extensive experiments on real network data for IM and another topic-aware viral marketing problem, named TVM.

[1]  Éva Tardos,et al.  Maximizing the Spread of Influence through a Social Network , 2015, Theory Comput..

[2]  Laks V. S. Lakshmanan,et al.  SIMPATH: An Efficient Algorithm for Influence Maximization under the Linear Threshold Model , 2011, 2011 IEEE 11th International Conference on Data Mining.

[3]  Wei Chen,et al.  Scalable influence maximization for prevalent viral marketing in large-scale social networks , 2010, KDD.

[4]  Ning Chen,et al.  On the approximability of influence in social networks , 2008, SODA '08.

[5]  Rong Zheng,et al.  On Budgeted Influence Maximization in Social Networks , 2012, IEEE Journal on Selected Areas in Communications.

[6]  Kyomin Jung,et al.  IRIE: Scalable and Robust Influence Maximization in Social Networks , 2011, 2012 IEEE 12th International Conference on Data Mining.

[7]  Aristides Gionis,et al.  STRIP: stream learning of influence probabilities , 2013, KDD.

[8]  Wei Chen,et al.  Efficient influence maximization in social networks , 2009, KDD.

[9]  My T. Thai,et al.  Interest-matching information propagation in multiple online social networks , 2012, CIKM.

[10]  Michel Minoux,et al.  Accelerated greedy algorithms for maximizing submodular set functions , 1978 .

[11]  Richard M. Karp,et al.  An Optimal Algorithm for Monte Carlo Estimation , 2000, SIAM J. Comput..

[12]  Philip N. Klein,et al.  On the Number of Iterations for Dantzig-Wolfe Optimization and Packing-Covering Approximation Algorithms , 2015, SIAM J. Comput..

[13]  Kian-Lee Tan,et al.  Real-time Targeted Influence Maximization for Online Advertisements , 2015, Proc. VLDB Endow..

[14]  Fan Chung Graham,et al.  Concentration Inequalities and Martingale Inequalities: A Survey , 2006, Internet Math..

[15]  Andreas Krause,et al.  Cost-effective outbreak detection in networks , 2007, KDD '07.

[16]  My T. Thai,et al.  Maximizing the Spread of Positive Influence in Online Social Networks , 2013, 2013 IEEE 33rd International Conference on Distributed Computing Systems.

[17]  Laks V. S. Lakshmanan,et al.  CELF++: optimizing the greedy algorithm for influence maximization in social networks , 2011, WWW.

[18]  Samir Khuller,et al.  The Budgeted Maximum Coverage Problem , 1999, Inf. Process. Lett..

[19]  My T. Thai,et al.  Least Cost Influence in Multiplex Social Networks: Model Representation and Analysis , 2013, 2013 IEEE 13th International Conference on Data Mining.

[20]  Laks V. S. Lakshmanan,et al.  Learning influence probabilities in social networks , 2010, WSDM '10.

[21]  Thang N. Dinh,et al.  Cost-aware Targeted Viral Marketing in billion-scale networks , 2016, IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications.

[22]  Ning Zhang,et al.  Time-Critical Influence Maximization in Social Networks with Time-Delayed Diffusion Process , 2012, AAAI.

[23]  Christian Borgs,et al.  Maximizing Social Influence in Nearly Optimal Time , 2012, SODA.

[24]  G. Nemhauser,et al.  Maximizing Submodular Set Functions: Formulations and Analysis of Algorithms* , 1981 .

[25]  Xiaokui Xiao,et al.  Influence Maximization in Near-Linear Time: A Martingale Approach , 2015, SIGMOD Conference.

[26]  My T. Thai,et al.  On the approximability of positive influence dominating set in social networks , 2014, J. Comb. Optim..

[27]  Yu Wang,et al.  Community-based greedy algorithm for mining top-K influential nodes in mobile social networks , 2010, KDD.

[28]  Xiaokui Xiao,et al.  Influence maximization: near-optimal time complexity meets practical efficiency , 2014, SIGMOD Conference.

[29]  Edith Cohen,et al.  Sketch-based Influence Maximization and Computation: Scaling up with Guarantees , 2014, CIKM.

[30]  Takuya Akiba,et al.  Fast and Accurate Influence Maximization on Large Networks with Pruned Monte-Carlo Simulations , 2014, AAAI.

[31]  Le Song,et al.  Scalable Influence Estimation in Continuous-Time Diffusion Networks , 2013, NIPS.

[32]  My T. Thai,et al.  Cost-Effective Viral Marketing for Time-Critical Campaigns in Large-Scale Social Networks , 2014, IEEE/ACM Transactions on Networking.

[33]  Charalampos E. Tsourakakis,et al.  Space- and Time-Efficient Algorithm for Maintaining Dense Subgraphs on One-Pass Dynamic Streams , 2015, STOC.

[34]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.