Bring Order into the Samples: A Novel Scalable Method for Influence Maximization

As a key problem in viral marketing, influence maximization has been extensively studied in the literature. Given a positive integer <inline-formula><tex-math notation="LaTeX">$k$</tex-math><alternatives> <inline-graphic xlink:href="wang-ieq1-2624734.gif"/></alternatives></inline-formula>, a social network <inline-formula> <tex-math notation="LaTeX">$\mathcal {G}$</tex-math><alternatives><inline-graphic xlink:href="wang-ieq2-2624734.gif"/> </alternatives></inline-formula> and a certain propagation model, it aims to find a set of <inline-formula> <tex-math notation="LaTeX">$k$</tex-math><alternatives><inline-graphic xlink:href="wang-ieq3-2624734.gif"/> </alternatives></inline-formula> nodes that have the largest influence spread. The state-of-the-art method IMM is based on the reverse influence sampling (RIS) framework. By using the martingale technique, it greatly outperforms the previous methods in efficiency. However, IMM still has limitations in scalability due to the high overhead of deciding a tight sample size. In this paper, instead of spending the effort on deciding a tight sample size, we present a novel bottom-<italic>k</italic> sketch based RIS framework, namely BKRIS, which brings the order of samples into the RIS framework. By applying the sketch technique, we can derive early termination conditions to significantly accelerate the seed set selection procedure. Moreover, we provide a cost-effective method to find a proper sample size to bound the quality of returned result. In addition, we provide several optimization techniques to reduce the cost of generating samples’ order and efficiently deal with the worst-case scenario. We demonstrate the efficiency and effectiveness of the proposed method over 10 real world datasets. Compared with the IMM approach, BKRIS can achieve up to two orders of magnitude speedup with almost the same influence spread. In the largest dataset with 1.8 billion edges, BKRIS can return 50 seeds in 1.3 seconds and return 5,000 seeds in 36.6 seconds. It takes IMM 55.32 second and 3,664.97 seconds, respectively.

[1]  Joel Oren,et al.  Influence at Scale: Distributed Computation of Complex Contagion in Networks , 2015, KDD.

[2]  Matthew Richardson,et al.  Mining the network value of customers , 2001, KDD '01.

[3]  Laks V. S. Lakshmanan,et al.  Viral Marketing Meets Social Advertising: Ad Allocation with Minimum Regret , 2014, Proc. VLDB Endow..

[4]  Michael D. Vose,et al.  A Linear Algorithm For Generating Random Numbers With a Given Distribution , 1991, IEEE Trans. Software Eng..

[5]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[6]  Wei Chen,et al.  Scalable influence maximization for prevalent viral marketing in large-scale social networks , 2010, KDD.

[7]  Christian Borgs,et al.  Maximizing Social Influence in Nearly Optimal Time , 2012, SODA.

[8]  Xiaokui Xiao,et al.  Influence maximization: near-optimal time complexity meets practical efficiency , 2014, SIGMOD Conference.

[9]  J. Wishart Statistical tables , 2018, Global Education Monitoring Report.

[10]  Edith Cohen,et al.  Sketch-based Influence Maximization and Computation: Scaling up with Guarantees , 2014, CIKM.

[11]  Bernhard Schölkopf,et al.  Uncovering the Temporal Dynamics of Diffusion Networks , 2011, ICML.

[12]  C. A. R. Hoare,et al.  Algorithm 65: find , 1961, Commun. ACM.

[13]  Éva Tardos,et al.  Maximizing the Spread of Influence through a Social Network , 2015, Theory Comput..

[14]  Laks V. S. Lakshmanan,et al.  SIMPATH: An Efficient Algorithm for Influence Maximization under the Linear Threshold Model , 2011, 2011 IEEE 11th International Conference on Data Mining.

[15]  My T. Thai,et al.  Stop-and-Stare: Optimal Sampling Algorithms for Viral Marketing in Billion-scale Networks , 2016, SIGMOD Conference.

[16]  Peter J. Haas,et al.  On synopses for distinct-value estimation under multiset operations , 2007, SIGMOD '07.

[17]  Kyomin Jung,et al.  IRIE: Scalable and Robust Influence Maximization in Social Networks , 2011, 2012 IEEE 12th International Conference on Data Mining.

[18]  Matthew Richardson,et al.  Mining knowledge-sharing sites for viral marketing , 2002, KDD.

[19]  Edith Cohen,et al.  Summarizing data using bottom-k sketches , 2007, PODC '07.

[20]  Xuemin Lin,et al.  Bring Order into the Samples: A Novel Scalable Method for Influence Maximization (Extended Abstract) , 2017, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).

[21]  Xiaokui Xiao,et al.  Influence Maximization in Near-Linear Time: A Martingale Approach , 2015, SIGMOD Conference.

[22]  Jian Pei,et al.  Continuous Influence Maximization: What Discounts Should We Offer to Social Network Users? , 2016, SIGMOD Conference.

[23]  Yifei Yuan,et al.  Scalable Influence Maximization in Social Networks under the Linear Threshold Model , 2010, 2010 IEEE International Conference on Data Mining.

[24]  Laks V. S. Lakshmanan,et al.  CELF++: optimizing the greedy algorithm for influence maximization in social networks , 2011, WWW.

[25]  Wei Chen,et al.  Efficient influence maximization in social networks , 2009, KDD.

[26]  Kian-Lee Tan,et al.  Real-time Targeted Influence Maximization for Online Advertisements , 2015, Proc. VLDB Endow..

[27]  Andreas Krause,et al.  Cost-effective outbreak detection in networks , 2007, KDD '07.