Active learning for networked data based on non-progressive diffusion model

We study the problem of active learning for networked data, where samples are connected with links and their labels are correlated with each other. We particularly focus on the setting of using the probabilistic graphical model to model the networked data, due to its effectiveness in capturing the dependency between labels of linked samples. We propose a novel idea of connecting the graphical model to the information diffusion process, and precisely define the active learning problem based on the non-progressive diffusion model. We show the NP-hardness of the problem and propose a method called MaxCo to solve it. We derive the lower bound for the optimal solution for the active learning setting, and develop an iterative greedy algorithm with provable approximation guarantees. We also theoretically prove the convergence and correctness of MaxCo. We evaluate MaxCo on four different genres of datasets: Coauthor, Slashdot, Mobile, and Enron. Our experiments show a consistent improvement over other competing approaches.

[1]  Jie Tang,et al.  Batch Mode Active Learning for Networked Data , 2012, TIST.

[2]  Matthew Richardson,et al.  Mining knowledge-sharing sites for viral marketing , 2002, KDD.

[3]  Mark Craven,et al.  An Analysis of Active Learning Strategies for Sequence Labeling Tasks , 2008, EMNLP.

[4]  Wei Chen,et al.  Scalable influence maximization for prevalent viral marketing in large-scale social networks , 2010, KDD.

[5]  Juan-Zi Li,et al.  Extraction and mining of an academic social network , 2008, WWW.

[6]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[7]  Masahiro Kimura,et al.  Extracting influential nodes on a social network for information diffusion , 2009, Data Mining and Knowledge Discovery.

[8]  Michael I. Jordan,et al.  Loopy Belief Propagation for Approximate Inference: An Empirical Study , 1999, UAI.

[9]  Xia Wang,et al.  Actively learning to infer social ties , 2012, Data Mining and Knowledge Discovery.

[10]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[11]  Ajay Mehra The Development of Social Network Analysis: A Study in the Sociology of Science , 2005 .

[12]  R. Varga,et al.  Proof of Theorem 4 , 1983 .

[13]  Alessandro Vespignani,et al.  Epidemic spreading in scale-free networks. , 2000, Physical review letters.

[14]  Jimeng Sun,et al.  Social influence analysis in large-scale networks , 2009, KDD.

[15]  David A. Cohn,et al.  Active Learning with Statistical Models , 1996, NIPS.

[16]  Jimeng Sun,et al.  Confluence: conformity influence in large social networks , 2013, KDD.

[17]  Lise Getoor,et al.  Active Learning for Networked Data , 2010, ICML.

[18]  A. Stuart,et al.  Non-Parametric Statistics for the Behavioral Sciences. , 1957 .

[19]  D. Wilson Levels of selection: An alternative to individualism in biology and the human sciences , 1989 .

[20]  Laks V. S. Lakshmanan,et al.  Learning influence probabilities in social networks , 2010, WSDM '10.

[21]  Ching-Lueh Chang,et al.  Spreading messages , 2008, Theor. Comput. Sci..

[22]  Matthew Richardson,et al.  Mining the network value of customers , 2001, KDD '01.

[23]  Jie Tang,et al.  Learning to Infer Social Ties in Large Networks , 2011, ECML/PKDD.

[24]  Raghav Kaushik,et al.  On active learning of record matching packages , 2010, SIGMOD Conference.

[25]  Andreas Krause,et al.  Near-Optimal Bayesian Active Learning with Noisy Observations , 2010, NIPS.

[26]  Rong Jin,et al.  Large-scale text categorization by batch mode active learning , 2006, WWW '06.

[27]  A. Barabasi,et al.  Halting viruses in scale-free networks. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[28]  Andrew McCallum,et al.  Toward Optimal Active Learning through Sampling Estimation of Error Reduction , 2001, ICML.

[29]  Jennifer Neville,et al.  Relational Active Learning for Joint Collective Classification Models , 2011, ICML.

[30]  Éva Tardos,et al.  Maximizing the Spread of Influence through a Social Network , 2015, Theory Comput..

[31]  Oscar Martinez,et al.  Integration of active learning in a collaborative CRF , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[32]  Stefan Wrobel,et al.  Active Hidden Markov Models for Information Extraction , 2001, IDA.