Active sampling for graph-aware classification

The present work deals with data-adaptive active sampling of graph nodes representing training data for binary classification. The graph may be given or constructed using similarity measures among nodal features. Leveraging the graph for classification builds on the premise that labels over neighboring nodes are correlated according to a categorical Markov random field (MRF). This model is further relaxed to a Gaussian (G)MRF with labels taking continuous values, an approximation that not only mitigates the combinatorial complexity of the categorical model, but also offers optimal unbiased soft predictors of the unlabeled nodes. The proposed sampling strategy is based on querying the node whose label disclosure is expected to inflict the largest expected mean-square deviation on the GMRF, a strategy which subsumes the existing variance-minimization-based sampling method. A simple yet effective heuristic is also introduced for increasing the exploration capabilities, and reducing bias of the resultant estimator, by taking into account the confidence on the model label predictions. The novel sampling strategy is based on quantities that are readily available without the need for model retraining, rendering it scalable to large graphs. Numerical tests using synthetic and real data demonstrate that the proposed methods achieve accuracy that is comparable or superior to the state-of-the-art even at reduced runtime.

[1]  Peter Bühlmann,et al.  Two optimal strategies for active learning of causal models from interventional data , 2012, Int. J. Approx. Reason..

[2]  Peter Bühlmann,et al.  Two Optimal Strategies for Active Learning of Causal Models from Interventions , 2012, ArXiv.

[3]  Kashima Hisashi,et al.  Budgeted stream-based active learning via adaptive submodular maximization , 2016 .

[4]  Paul N. Bennett,et al.  Active Sampling of Networks , 2012 .

[5]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[6]  David A. Cohn,et al.  Active Learning with Statistical Models , 1996, NIPS.

[7]  Charu C. Aggarwal,et al.  Selective sampling on graphs for classification , 2013, KDD.

[8]  Takeo Kanade,et al.  Active sample selection and correction propagation on a gradually-augmented graph , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Shiliang Sun,et al.  Active learning of Gaussian processes with manifold-preserving graph reduction , 2014, Neural Computing and Applications.

[10]  Georgios B. Giannakis,et al.  Kernel-Based Structural Equation Models for Topology Identification of Directed Networks , 2016, IEEE Transactions on Signal Processing.

[11]  Peter Kaiser,et al.  Predicting Positive p53 Cancer Rescue Regions Using Most Informative Positive (MIP) Active Learning , 2009, PLoS Comput. Biol..

[12]  Mark Craven,et al.  An Analysis of Active Learning Strategies for Sequence Labeling Tasks , 2008, EMNLP.

[13]  Robert D. Nowak,et al.  Distilled Sensing: Adaptive Sampling for Sparse Detection and Estimation , 2010, IEEE Transactions on Information Theory.

[14]  Roman Garnett,et al.  Σ-Optimality for Active Learning on Gaussian Random Fields , 2013, NIPS.

[15]  Joachim Denzler,et al.  Selecting Influential Examples: Active Learning with Expected Model Output Changes , 2014, ECCV.

[16]  Georgios B. Giannakis,et al.  Multi-kernel based nonlinear models for connectivity identification of brain networks , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17]  Claudio Gentile,et al.  Active Learning on Trees and Graphs , 2010, COLT.

[18]  Jiawei Han,et al.  A Variance Minimization Criterion to Active Learning on Graphs , 2012, AISTATS.

[19]  Antonio Ortega,et al.  Active learning on weighted graphs using adaptive and non-adaptive approaches , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  Jiawei Han,et al.  Towards Active Learning on Graphs: An Error Bound Minimization Approach , 2012, 2012 IEEE 12th International Conference on Data Mining.

[21]  Jianping Yin,et al.  Graph-Based Active Learning Based on Label Propagation , 2008, MDAI.

[22]  Bin Li,et al.  A survey on instance selection for active learning , 2012, Knowledge and Information Systems.

[23]  Robert D. Nowak,et al.  Graph-based active learning: A new look at expected error minimization , 2016, 2016 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[24]  J. Lafferty,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[25]  Joachim M. Buhmann,et al.  Active learning for semantic segmentation with expected change , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.