Know by a handful the whole sack: efficient sampling for top-k influential user identification in large graphs

Influence Maximization aims to find the top-K influential individuals to maximize the influence spread within a social network, which remains an important yet challenging problem. Most existing greedy algorithms mainly focus on computing the exact influence spread, leading to low computational efficiency and limiting their application to real-world social networks. While in this paper we show that through supervised sampling, we can efficiently estimate the influence spread at only negligible cost of precision, thus significantly reducing the execution time. Motivated by this, we propose ESMCE, a power-law exponent supervised Monte Carlo estimation method. In particular, ESMCE exploits the power-law exponent of the social network to guide the sampling, and employs multiple iterative steps to guarantee the estimation accuracy. Moreover, ESMCE shows excellent scalability and well suits large-scale social networks. Extensive experiments on six real-world social networks demonstrate that, compared with state-of-the-art greedy algorithms, ESMCE is able to achieve almost two orders of magnitude speedup in execution time with only negligible error (2.21 % on average) in influence spread.

[1]  Hai Zhuge,et al.  Communities and Emerging Semantics in Semantic Link Network: Discovery and Learning , 2009, IEEE Transactions on Knowledge and Data Engineering.

[2]  Matthew Richardson,et al.  Mining the network value of customers , 2001, KDD '01.

[3]  Azadeh Iranmehr,et al.  Trust Management for Semantic Web , 2009, 2009 Second International Conference on Computer and Electrical Engineering.

[4]  Yifei Yuan,et al.  Influence Maximization in Social Networks When Negative Opinions May Emerge and Propagate , 2011, SDM.

[5]  Hai Zhuge,et al.  The Web Resource Space Model , 2008 .

[6]  B. J. Cory,et al.  Sample size reduction in Monte Carlo based use-of-system costing of power systems , 1991 .

[7]  Kyomin Jung,et al.  IRIE: A Scalable Influence Maximization Algorithm for Independent Cascade Model and Its Extensions , 2011, ArXiv.

[8]  Hai Zhuge,et al.  Probabilistic Resource Space Model for Managing Resources in Cyber-Physical Society , 2012, IEEE Transactions on Services Computing.

[9]  Jure Leskovec,et al.  The dynamics of viral marketing , 2005, EC '06.

[10]  Fang-Mei Tseng,et al.  Applied Hybrid Grey Model to Forecast Seasonal Time Series , 2001 .

[11]  Yi Lin,et al.  Grey Systems: Theory and Applications , 2010 .

[12]  Edith Cohen,et al.  Size-Estimation Framework with Applications to Transitive Closure and Reachability , 1997, J. Comput. Syst. Sci..

[13]  Jure Leskovec,et al.  Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters , 2008, Internet Math..

[14]  Hai Zhuge,et al.  Semantic linking through spaces for cyber-physical-socio intelligence: A methodology , 2011, Artif. Intell..

[15]  Lynda L. McGhie,et al.  World Wide Web , 2011, Encyclopedia of Information Assurance.

[16]  Christos Faloutsos,et al.  Graphs over time: densification laws, shrinking diameters and possible explanations , 2005, KDD '05.

[17]  Reiichiro Kawai Adaptive Monte Carlo Variance Reduction with Two-time-scale Stochastic Approximation , 2007, Monte Carlo Methods Appl..

[18]  Andreas Krause,et al.  Cost-effective outbreak detection in networks , 2007, KDD '07.

[19]  Ramakrishnan Srikant,et al.  Kdd-2001: Proceedings of the Seventh Acm Sigkdd International Conference on Knowledge Discovery and Data Mining : August 26-29, 2001 San Francisco, Ca, USA , 2002 .

[20]  Albert W. L. Yao,et al.  An improved Grey-based approach for electricity demand forecasting , 2003 .

[21]  Wei Chen,et al.  Scalable influence maximization for prevalent viral marketing in large-scale social networks , 2010, KDD.

[22]  Hai Zhuge,et al.  The Knowledge Grid:Toward Cyber-Physical Society , 2012 .

[23]  Wei Chen,et al.  Efficient influence maximization in social networks , 2009, KDD.

[24]  Hai Zhuge,et al.  Topological centrality and its e-Science applications , 2010, J. Assoc. Inf. Sci. Technol..

[25]  Yu Wang,et al.  Community-based greedy algorithm for mining top-K influential nodes in mobile social networks , 2010, KDD.

[26]  Hai Zhuge The Web Resource Space Model (Web Information Systems Engineering and Internet Technologies Book Series) , 2007 .

[27]  Gao Cong,et al.  Simulated Annealing Based Influence Maximization in Social Networks , 2011, AAAI.

[28]  Sifeng Liu,et al.  Advances in grey systems research , 2010 .