Towards Interpretation of Pairwise Learning

Recently, there are increasingly more attentions paid to an important family of learning problems called pairwise learning, in which the associated loss functions depend on pairs of instances. Despite the tremendous success of pairwise learning in many real-world applications, the lack of transparency behind the learned pairwise models makes it difficult for users to understand how particular decisions are made by these models, which further impedes users from trusting the predicted results. To tackle this problem, in this paper, we study feature importance scoring as a specific approach to the problem of interpreting the predictions of black-box pairwise models. Specifically, we first propose a novel adaptive Shapley-value-based interpretation method, based on which a vector of importance scores associated with the underlying features of a testing instance pair can be adaptively calculated with the consideration of feature correlations, and these scores can be used to indicate which features make key contributions to the final prediction. Considering that Shapley-value-based methods are usually computationally challenging, we further propose a novel robust approximation interpretation method for pairwise models. This method is not only much more efficient but also robust to data noise. To the best of our knowledge, we are the first to investigate how to enable interpretation in pairwise learning. Theoretical analysis and extensive experiments demonstrate the effectiveness of the proposed methods.

[1]  Chenglin Miao,et al.  Metric Learning from Probabilistic Labels , 2018, KDD.

[2]  Ji Wan,et al.  SOML: Sparse Online Metric Learning with Application to Image Retrieval , 2014, AAAI.

[3]  Chenglin Miao,et al.  Uncorrelated Patient Similarity Learning , 2018, SDM.

[4]  E.J. Candes Compressive Sampling , 2022 .

[5]  Le Song,et al.  L-Shapley and C-Shapley: Efficient Model Interpretation for Structured Data , 2018, ICLR.

[6]  D. Donoho For most large underdetermined systems of linear equations the minimal 𝓁1‐norm solution is also the sparsest solution , 2006 .

[7]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[8]  Avanti Shrikumar,et al.  Learning Important Features Through Propagating Activation Differences , 2017, ICML.

[9]  Nassir Navab,et al.  A Taxonomy and Library for Visualizing Learned Features in Convolutional Neural Networks , 2016, ArXiv.

[10]  Siwei Lyu,et al.  Fast Convergence of Online Pairwise Learning Algorithms , 2016, AISTATS.

[11]  Fenglong Ma,et al.  Personalized disease prediction using a CNN-based similarity learning method , 2017, 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[12]  Yair Zick,et al.  Algorithmic Transparency via Quantitative Input Influence: Theory and Experiments with Learning Systems , 2016, 2016 IEEE Symposium on Security and Privacy (SP).

[13]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[14]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[15]  Zhi-Hua Zhou,et al.  One-Pass AUC Optimization , 2013, ICML.

[16]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[17]  Chenglin Miao,et al.  Deep Metric Learning: The Generalization Analysis and an Adaptive Algorithm , 2019, IJCAI.

[18]  Fenglong Ma,et al.  Multi-task Sparse Metric Learning for Monitoring Patient Similarity Progression , 2018, 2018 IEEE International Conference on Data Mining (ICDM).

[19]  Aidong Zhang,et al.  Representation Learning for Treatment Effect Estimation from Observational Data , 2018, NeurIPS.

[20]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[21]  Percy Liang,et al.  Understanding Black-box Predictions via Influence Functions , 2017, ICML.

[22]  Pengtao Xie,et al.  Nonoverlap-Promoting Variable Selection , 2018, ICML.

[23]  Erik Strumbelj,et al.  Explaining prediction models and individual predictions with feature contributions , 2014, Knowledge and Information Systems.

[24]  Ke Wang,et al.  Ranking Distillation: Learning Compact Ranking Models With High Performance for Recommender System , 2018, KDD.

[25]  Michael Elad,et al.  From Sparse Solutions of Systems of Equations to Sparse Modeling of Signals and Images , 2009, SIAM Rev..

[26]  Siwei Lyu,et al.  Stochastic Online AUC Maximization , 2016, NIPS.

[27]  Erik Strumbelj,et al.  An Efficient Explanation of Individual Classifications using Game Theory , 2010, J. Mach. Learn. Res..

[28]  Shiyu Chang,et al.  Low-Rank Sparse Feature Selection for Patient Similarity Learning , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[29]  Siwei Lyu,et al.  Stochastic Proximal Algorithms for AUC Maximization , 2018, ICML.

[30]  Kaizhu Huang,et al.  Sparse Metric Learning via Smooth Optimization , 2009, NIPS.

[31]  Markus H. Gross,et al.  Explaining Deep Neural Networks with a Polynomial Time Algorithm for Shapley Values Approximation , 2019, ICML.