Sparse Multi-Task Learning for Detecting Influential Nodes in an Implicit Diffusion Network

How to identify influential nodes is a central research topic in information diffusion analysis. Many existing methods rely on the assumption that the network structure is completely known by the model. However, in many applications, such a network is either unavailable or insufficient to explain the underlying information diffusion phenomena. To address this challenge, we develop a multi-task sparse linear influence model (MSLIM), which can simultaneously predict the volume for each contagion and automatically identify sets of the most influential nodes for different contagions. Our method is based on the linear influence model with two main advantages: 1) it does not require the network structure; 2) it can detect different sets of the most influential nodes for different contagions. To solve the corresponding convex optimization problem for learning the model, we adopt the accelerated gradient method (AGM) framework and show that there is an exact closed-form solution for the proximal mapping. Therefore, the optimization procedure achieves the optimal first-order convergence rate and can be scaled to very large datasets. The proposed model is validated on a set of 2.6 millions tweets from 1000 users of Twitter. We show that MSLIM can efficiently select the most influential users for specific contagions.We also present several interesting patterns of the selected influential users.

[1]  Xi Chen,et al.  Smoothing Proximal Gradient Method for General Structured Sparse Learning , 2011, UAI.

[2]  Michael R. Lyu,et al.  Mining social networks using heat diffusion processes for marketing candidates selection , 2008, CIKM '08.

[3]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[4]  Jure Leskovec,et al.  Modeling Information Diffusion in Implicit Networks , 2010, 2010 IEEE International Conference on Data Mining.

[5]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[6]  Wei Chen,et al.  Scalable influence maximization for prevalent viral marketing in large-scale social networks , 2010, KDD.

[7]  Xi Chen,et al.  An Efficient Optimization Algorithm for Structured Sparse CCA, with Applications to eQTL Mapping , 2011, Statistics in Biosciences.

[8]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[9]  Andreas Harth,et al.  Canonical Trends: Detecting Trend Setters in Web Data , 2012, ICML.

[10]  Michael I. Jordan,et al.  High-dimensional union support recovery in multivariate regression , 2008, NIPS 2008.

[11]  Man Lung Yiu,et al.  Group-by skyline query processing in relational engines , 2009, CIKM.

[12]  Éva Tardos,et al.  Maximizing the Spread of Influence through a Social Network , 2015, Theory Comput..

[13]  Éva Tardos,et al.  Influential Nodes in a Diffusion Model for Social Networks , 2005, ICALP.

[14]  Jieping Ye,et al.  Multi-Task Feature Learning Via Efficient l2, 1-Norm Minimization , 2009, UAI.

[15]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[16]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[17]  Matthew Richardson,et al.  Mining knowledge-sharing sites for viral marketing , 2002, KDD.

[18]  Duncan J. Watts,et al.  Everyone's an influencer: quantifying influence on twitter , 2011, WSDM '11.

[19]  Krishna P. Gummadi,et al.  Measuring User Influence in Twitter: The Million Follower Fallacy , 2010, ICWSM.

[20]  Xi Chen,et al.  Accelerated Gradient Method for Multi-task Sparse Learning Problem , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[21]  Qi He,et al.  TwitterRank: finding topic-sensitive influential twitterers , 2010, WSDM '10.

[22]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.