Query-driven Active Surveying for Collective Classification

In network classification problems such as those found in intelligence gathering, public health, and viral marketing, one is often only interested in inferring the labels of a subset of the nodes. We refer to this subset as the query set, and define the problem as query-driven collective classification. We study this problem in a practical active learning framework, in which the learning algorithm can survey non-query nodes to obtain their labels and network structure. We derive a surveying strategy aimed toward optimal inference on the query set. Considering both feature and structural smoothness, concepts that we formally define, we develop an algorithm which adaptively selects survey nodes by estimating which form of smoothness is most appropriate. We evaluate our algorithm on several network datasets and demonstrate its improvements over standard active learning methods.

[1]  Sofus A. Macskassy Using graph-based metrics with empirical risk minimization to speed up active learning on networked data , 2009, KDD.

[2]  Lise Getoor,et al.  Active Inference for Collective Classification , 2010, AAAI.

[3]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[4]  Lise Getoor,et al.  Active Surveying: A Probabilistic Approach for Identifying Key Opinion Leaders , 2011, IJCAI.

[5]  Bin Wu,et al.  Exploiting Network Structure for Active Inference in Collective Classification , 2007, Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007).

[6]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[7]  Lise Getoor,et al.  Collective Classification in Network Data , 2008, AI Mag..

[8]  Paul N. Bennett,et al.  Active Sampling of Networks , 2012 .

[9]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[10]  Xiaojin Zhu,et al.  Introduction to Semi-Supervised Learning , 2009, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[11]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[12]  Lise Getoor,et al.  Active Learning for Networked Data , 2010, ICML.

[13]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[14]  M. Newman,et al.  Mixing patterns in networks. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[15]  Andrew McCallum,et al.  Toward Optimal Active Learning through Sampling Estimation of Error Reduction , 2001, ICML.

[16]  Emanuele Della Valle,et al.  An Introduction to Information Retrieval , 2013 .

[17]  Jennifer Neville,et al.  Relational Active Learning for Joint Collective Classification Models , 2011, ICML.

[18]  David Liben-Nowell,et al.  The link-prediction problem for social networks , 2007 .

[19]  Foster J. Provost,et al.  Active Sampling for Class Probability Estimation and Ranking , 2004, Machine Learning.

[20]  Foster J. Provost,et al.  Classification in Networked Data: a Toolkit and a Univariate Case Study , 2007, J. Mach. Learn. Res..