Why Interpretability in Machine Learning? An Answer Using Distributed Detection and Data Fusion Theory

As artificial intelligence is increasingly affecting all parts of society and life, there is growing recognition that human interpretability of machine learning models is important. It is often argued that accuracy or other similar generalization performance metrics must be sacrificed in order to gain interpretability. Such arguments, however, fail to acknowledge that the overall decision-making system is composed of two entities: the learned model and a human who fuses together model outputs with his or her own information. As such, the relevant performance criteria should be for the entire system, not just for the machine learning component. In this work, we characterize the performance of such two-node tandem data fusion systems using the theory of distributed detection. In doing so, we work in the population setting and model interpretable learned models as multi-level quantizers. We prove that under our abstraction, the overall system of a human with an interpretable classifier outperforms one with a black box classifier.

[1]  Biao Chen,et al.  Interactive distributed detection with conditionally independent observations , 2013, 2013 IEEE Wireless Communications and Networking Conference (WCNC).

[2]  Yunmin Zhu,et al.  Some Progress in Sensor Network Decision Fusion , 2007, J. Syst. Sci. Complex..

[3]  Pramod K. Varshney,et al.  Robust Distributed Maximum Likelihood Estimation with Quantized Data , 2012, ArXiv.

[4]  Tong Wang Hybrid Decision Making: When Interpretable Models Collaborate With Black-Box Models , 2018, ArXiv.

[5]  H. Chernoff A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations , 1952 .

[6]  Amit Dhurandhar,et al.  A Formal Framework to Characterize Interpretability of Procedures , 2017, ArXiv.

[7]  Aditya Krishna Menon,et al.  The cost of fairness in binary classification , 2018, FAT.

[8]  Gilles Blanchard,et al.  Classification with Asymmetric Label Noise: Consistency and Maximal Denoising , 2013, COLT.

[9]  Biao Chen,et al.  On optimal fusion architecture for a two-sensor tandem distributed detection system , 2013, 2013 IEEE Global Conference on Signal and Information Processing.

[10]  H. Vincent Poor,et al.  Consistency in models for distributed learning under communication constraints , 2005, IEEE Transactions on Information Theory.

[11]  Kush R. Varshney,et al.  Learning Interpretable Classification Rules with Boolean Compressed Sensing , 2017 .

[12]  Nicky Case,et al.  How To Become A Centaur , 2018 .

[13]  Pramod K. Varshney,et al.  Robust distributed maximum likelihood estimation with dependent quantized data , 2012, Autom..

[14]  Xin Tong,et al.  Neyman-Pearson Classification, Convexity and Stochastic Constraints , 2011, J. Mach. Learn. Res..

[15]  Pramod K. Varshney,et al.  Distributed Detection and Data Fusion , 1996 .

[16]  Michael I. Jordan,et al.  Nonparametric decentralized detection using kernel methods , 2005, IEEE Transactions on Signal Processing.

[17]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[18]  Yunmin Zhu,et al.  Performance Analysis of Communication Direction for Two-Sensor Tandem Binary Decision System , 2009, IEEE Transactions on Information Theory.

[19]  Biao Chen,et al.  Interactive fusion in distributed detection: Architecture and performance analysis , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[20]  Stephen P. Boyd,et al.  A tutorial on geometric programming , 2007, Optimization and Engineering.

[21]  Sergei Nirenburg,et al.  Cognitive Systems: Toward Human-Level Functionality , 2017, AI Mag..

[22]  Larry A. Wasserman,et al.  SpAM: Sparse Additive Models , 2007, NIPS.

[23]  Bernhard Schölkopf,et al.  A Kernel Method for the Two-Sample-Problem , 2006, NIPS.

[24]  Biao Chen,et al.  Data reduction in tandem fusion systems , 2013, 2013 IEEE China Summit and International Conference on Signal and Information Processing.