Bayesian Optimal Active Search on Graphs

In many classification problems, including numerous examples on modern large-scale graph datasets, a large quantity of unlabeled data are available. The cost of obtaining a label for such data can be very expensive, for example when human intervention is required. Both the semi-supervised and active learning communities have approached such problems. The former focuses on how to aid the classification task by exploiting the distribution of unlabeled data, and the latter addresses the problem of choosing the most useful labels to acquire, that would subsequently minimize the cost incurred in the pursuit of the learning goal. Various algorithms have been proposed by these communities that learn from a large dataset with few labeled data, and graph datasets have in particular received a lot of attention for large-scale applications such as collaborative filtering for recommendation systems and link prediction in social networks. Here, we focus on a specific active binary-classification problem, where the goal is to find the members of a particular class as quickly as possible. In some situations, such as fraud detection or the investigative analysis of potentially criminal social networks, only the members of the malicious class are sought, whereas obtaining labels for points in the positive class is only useful for the purpose of facilitating that task. We derive the Bayesian optimal policy for this decision problem and test our proposed algorithm on two large-scale graph datasets. The optimal policy can be implemented in parallel, and it is our hope that the described algorithm can scale to even larger graphs.

[1]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[2]  Donald R. Jones,et al.  Efficient Global Optimization of Expensive Black-Box Functions , 1998, J. Glob. Optim..

[3]  J. Lafferty,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[4]  Merlijn Sevenster Battleships as a Decision Problem , 2004, J. Int. Comput. Games Assoc..

[5]  Matthew Brand,et al.  A Random Walks Perspective on Maximizing Satisfaction and Profit , 2005, SDM.

[6]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[7]  F. Chung Laplacians and the Cheeger Inequality for Directed Graphs , 2005 .

[8]  Bernhard Schölkopf,et al.  Learning from labeled and unlabeled data on a directed graph , 2005, ICML.

[9]  Zoubin Ghahramani,et al.  Graph Kernels by Spectral Transforms , 2006, Semi-Supervised Learning.

[10]  Dunja Mladenic,et al.  kNN Versus SVM in the Collaborative Filtering Framework , 2006, Data Science and Classification.

[11]  François Fouss,et al.  Random-Walk Computation of Similarities between Nodes of a Graph with Application to Collaborative Recommendation , 2007, IEEE Transactions on Knowledge and Data Engineering.

[12]  C. Lee Giles,et al.  Learning on the border: active learning in imbalanced data classification , 2007, CIKM '07.

[13]  Lyle H. Ungar,et al.  Machine Learning manuscript No. (will be inserted by the editor) Active Learning for Logistic Regression: , 2007 .

[14]  Purnamrita Sarkar,et al.  A Tractable Approach to Finding Closest Truncated-commute-time Neighbors in Large Graphs , 2007, UAI.

[15]  Michael A. Osborne,et al.  Gaussian Processes for Global Optimization , 2008 .

[16]  Jingrui He,et al.  Rare Class Discovery Based on Active Learning , 2008, ISAIM.

[17]  Purnamrita Sarkar,et al.  Fast incremental proximity search in large graphs , 2008, ICML '08.

[18]  François Fouss,et al.  Graph nodes clustering with the sigmoid commute-time kernel: A comparative study , 2009, Data Knowl. Eng..

[19]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[20]  Dick den Hertog,et al.  Puzzle - Solving the Battleship Puzzle as an Integer Programming Problem , 2010, INFORMS Trans. Educ..

[21]  U. V. Luxburg,et al.  Getting lost in space: large sample analysis of the commute distance , 2010, NIPS 2010.