论文信息 - Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims

Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims

With the recent wave of progress in artificial intelligence (AI) has come a growing awareness of the large-scale impacts of AI systems, and recognition that existing regulations and norms in industry and academia are insufficient to ensure responsible AI development. In order for AI developers to earn trust from system users, customers, civil society, governments, and other stakeholders that they are building AI responsibly, they will need to make verifiable claims to which they can be held accountable. Those outside of a given organization also need effective means of scrutinizing such claims. This report suggests various steps that different stakeholders can take to improve the verifiability of claims made about AI systems and their associated development processes, with a focus on providing evidence about the safety, security, fairness, and privacy protection of AI systems. We analyze ten mechanisms for this purpose--spanning institutions, software, and hardware--and make recommendations aimed at implementing, exploring, or improving those mechanisms.

[1] B. Mittelstadt. Principles Alone Cannot Guarantee Ethical AI , 2019 .

[2] Inioluwa Deborah Raji,et al. ABOUT ML: Annotation and Benchmarking on Understanding and Transparency of Machine Learning Lifecycles , 2019, ArXiv.

[3] Suman Jana,et al. Certified Robustness to Adversarial Examples with Differential Privacy , 2018, 2019 IEEE Symposium on Security and Privacy (SP).

[4] P. Haas. Introduction: epistemic communities and international policy coordination , 1992, International Organization.

[5] Inioluwa Deborah Raji,et al. Closing the AI accountability gap: defining an end-to-end framework for internal algorithmic auditing , 2020, FAT*.

[6] Deborah Silver,et al. Feature Visualization , 1994, Scientific Visualization.

[7] Shujie Cui,et al. SGX-LKL: Securing the Host OS Interface for Trusted Execution , 2019, ArXiv.

[8] Sameer Singh,et al. How can we fool LIME and SHAP? Adversarial Attacks on Post hoc Explanation Methods , 2019, ArXiv.

[9] S. Cimbala. Artificial Intelligence and National Security , 1986 .

[10] Jeffrey Heer,et al. Errudite: Scalable, Reproducible, and Testable Error Analysis , 2019, ACL.

[11] Matthias Hein,et al. Formal Guarantees on the Robustness of a Classifier against Adversarial Manipulation , 2017, NIPS.

[12] Matthias Bethge,et al. Approximating CNNs with Bag-of-local-Features models works surprisingly well on ImageNet , 2019, ICLR.

[13] J. H. Davis,et al. An Integrative Model Of Organizational Trust , 1995 .

[14] Ehsan Toreini,et al. The relationship between trust in AI and trustworthy machine learning technologies , 2019, FAT*.

[15] Mike Ananny,et al. Seeing without knowing: Limitations of the transparency ideal and its application to algorithmic accountability , 2018, New Media Soc..

[16] Albert Gordo,et al. Learning Global Additive Explanations for Neural Nets Using Model Distillation , 2018 .

[17] Brian W. Powers,et al. Dissecting racial bias in an algorithm used to manage the health of populations , 2019, Science.

[18] Jesse Vig,et al. A Multiscale Visualization of Attention in the Transformer Model , 2019, ACL.

[19] Jess Whittlestone,et al. Reducing malicious use of synthetic media research: Considerations and potential release practices for machine learning , 2019, ArXiv.

[20] O. von Hagen,et al. Voluntary standards in developing countries: the potential of voluntary standards and their role in international trade. , 2010 .

[21] Philippa Gardner,et al. Towards a program logic for JavaScript , 2012, POPL '12.

[22] Cynthia Rudin,et al. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead , 2018, Nature Machine Intelligence.

[23] Bolei Zhou,et al. Semantic photo manipulation with a generative image prior , 2019, ACM Trans. Graph..

[24] Pieter Abbeel,et al. InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[25] Bakhtiyar Tuzmukhamedov,et al. Appendix A. Interpretation of treaties authenticated in different languages (a case study of the Treaty on the Limitation of Anti-Ballistic Missile Systems) , 2015 .

[26] Hassan Takabi,et al. Privacy-preserving Machine Learning as a Service , 2018, Proc. Priv. Enhancing Technol..

[27] Andrew Slavin Ross,et al. Improving the Adversarial Robustness and Interpretability of Deep Neural Networks by Regularizing their Input Gradients , 2017, AAAI.

[28] Kamalika Chaudhuri,et al. Privacy-preserving logistic regression , 2008, NIPS.

[29] Meaningful Human Control , Artificial Intelligence and Autonomous Weapons Briefing paper for delegates at the Convention on Certain Conventional Weapons , 2022 .

[30] Hyrum S. Anderson,et al. The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation , 2018, ArXiv.

[31] Rodrigo Bruno,et al. Graviton: Trusted Execution Environments on GPUs , 2018, OSDI.

[32] Richard Nock,et al. Advances and Open Problems in Federated Learning , 2019, Found. Trends Mach. Learn..

[33] Arvind Satyanarayan,et al. The Building Blocks of Interpretability , 2018 .

[34] Allan Dafoe,et al. Artificial Intelligence: American Attitudes and Trends , 2019, SSRN Electronic Journal.

[35] Adrian Weller,et al. You Shouldn't Trust Me: Learning Models Which Conceal Unfairness From Multiple Explanation Methods , 2020, SafeAI@AAAI.

[36] Frederic Joseph Brown,et al. Chemical Warfare: A Study in Restraints , 2005 .

[37] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .

[38] Quanshi Zhang,et al. Interpreting CNNs via Decision Trees , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39] Berkeley J. Dietvorst,et al. Algorithm Aversion: People Erroneously Avoid Algorithms after Seeing Them Err , 2014, Journal of experimental psychology. General.

[40] Nitesh V. Chawla,et al. SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[41] E. Trist. The Evolution of Socio-Technical Systems: A Conceptual Framework and an Action Research Program , 1981 .

[42] Mykel J. Kochenderfer,et al. Algorithms for Verifying Deep Neural Networks , 2019, Found. Trends Optim..

[43] Adam L. Alter,et al. Irresistible: The Rise of Addictive Technology and the Business of Keeping Us Hooked , 2017 .

[44] Fernando Nogueira,et al. Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning , 2016, J. Mach. Learn. Res..

[45] Maaike Verbruggen,et al. The Role of Civilian Innovation in the Development of Lethal Autonomous Weapon Systems , 2019, Global Policy.

[46] Dumitru Erhan,et al. A Benchmark for Interpretability Methods in Deep Neural Networks , 2018, NeurIPS.

[47] Lina M. Khan. Amazon's Antitrust Paradox , 2017 .

[48] Charli Carpenter,et al. "Lost" Causes: Agenda Vetting in Global Issue Networks and the Shaping of Human Security , 2014 .

[49] Mykel J. Kochenderfer,et al. Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks , 2017, CAV.

[50] Jess Whittlestone,et al. The Role and Limits of Principles in AI Ethics: Towards a Focus on Tensions , 2019, AIES.

[51] Rebecca Crootof,et al. The Killer Robots Are Here: Legal and Policy Implications , 2014 .

[52] Matthew Botvinick,et al. On the importance of single directions for generalization , 2018, ICLR.

[53] Dimitrios Pendarakis,et al. YerbaBuena: Securing Deep Learning Inference Data via Enclave-based Ternary Model Partitioning , 2018 .

[54] Bolei Zhou,et al. Network Dissection: Quantifying Interpretability of Deep Visual Representations , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55] Ankur Taly,et al. Explainable machine learning in deployment , 2019, FAT*.

[56] Eliot A. Cohen,et al. Encyclopedia of arms control and disarmament , 1993 .

[57] Pradeep Ravikumar,et al. Representer Point Selection for Explaining Deep Neural Networks , 2018, NeurIPS.

[58] F. GanzH.,et al. Political Realism and Political Idealism: a Study in Theories and Realities , 1952 .

[59] Miles Brundage,et al. The Role of Cooperation in Responsible AI Development , 2019, ArXiv.

[60] Kate Saenko,et al. RISE: Randomized Input Sampling for Explanation of Black-box Models , 2018, BMVC.

[61] Dawn Xiaodong Song,et al. Efficient Deep Learning on Multi-Source Private Data , 2018, ArXiv.

[62] Ossip K. Flechtheim,et al. Political Realism and Political Idealism: A Study in Theories and Realities. , 1951 .

[63] Mykel J. Kochenderfer,et al. The Adaptive Stress Testing Formulation , 2020, ArXiv.

[64] Rob Fergus,et al. Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[65] David L. Dill,et al. Developing Bug-Free Machine Learning Systems With Formal Mathematics , 2017, ICML.

[66] Andrea Vedaldi,et al. Understanding Image Representations by Measuring Their Equivariance and Equivalence , 2014, International Journal of Computer Vision.

[67] Mykel J. Kochenderfer,et al. Toward Scalable Verification for Safety-Critical Deep Networks , 2018, ArXiv.

[68] J G Chefitz,et al. FTC and DOJ jointly propose antitrust guidelines for collaborations among competitors. , 1999, Health care law monthly.

[69] Inioluwa Deborah Raji,et al. On the Legal Compatibility of Fairness Definitions , 2019, ArXiv.

[70] H. Brendan McMahan,et al. Learning Differentially Private Recurrent Language Models , 2017, ICLR.

[71] Andrew Zisserman,et al. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[72] Thomas F. Wenisch,et al. Foreshadow: Extracting the Keys to the Intel SGX Kingdom with Transient Out-of-Order Execution , 2018, USENIX Security Symposium.

[73] Allan Dafoe,et al. The Windfall Clause: Distributing the Benefits of AI for the Common Good , 2019, AIES.

[74] Jakub W. Pachocki,et al. Dota 2 with Large Scale Deep Reinforcement Learning , 2019, ArXiv.

[75] Haydn Belfield,et al. Activism by the AI Community: Analysing Recent Achievements and Future Prospects , 2020, AIES.

[76] G. Anderson,et al. The Economic Theory of Clubs , 2004 .

[77] Abhinav Verma,et al. Programmatically Interpretable Reinforcement Learning , 2018, ICML.

[78] Taesup Moon,et al. Fooling Neural Network Interpretations via Adversarial Model Manipulation , 2019, NeurIPS.

[79] Sean Watts,et al. Autonomous Weapons: Regulation Tolerant or Regulation Resistant? , 2015 .

[80] R. Jervis. Cooperation under the Security Dilemma , 1978, World Politics.

[81] Brendan Dolan-Gavitt,et al. BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain , 2017, ArXiv.

[82] Jan Barton,et al. Who Cares About Auditor Reputation? , 2003 .

[83] Jure Leskovec,et al. Interpretable Decision Sets: A Joint Framework for Description and Prediction , 2016, KDD.

[84] Úlfar Erlingsson,et al. The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks , 2018, USENIX Security Symposium.

[85] Yi Zeng,et al. Linking Artificial Intelligence Principles , 2018, SafeAI@AAAI.

[86] John Schulman,et al. Concrete Problems in AI Safety , 2016, ArXiv.

[87] Ole J. Mengshoel,et al. Adaptive stress testing of airborne collision avoidance systems , 2015, 2015 IEEE/AIAA 34th Digital Avionics Systems Conference (DASC).

[88] Diego Gambetta. Trust : making and breaking cooperative relations , 1992 .

[89] Laurent Orseau,et al. AI Safety Gridworlds , 2017, ArXiv.

[90] Autonomous weapon systems : Evaluating the capacity for ‘ meaningful human control ’ in weapon review processes , 2017 .

[91] David Flynn,et al. A Safety Framework for Critical Systems Utilising Deep Neural Networks , 2020, SAFECOMP.

[92] Ankur Taly,et al. Axiomatic Attribution for Deep Networks , 2017, ICML.

[93] Richard Dean Burns. The Evolution of Arms Control: From Antiquity to the Nuclear Age , 2009 .

[94] Thomas Brox,et al. Synthesizing the preferred inputs for neurons in neural networks via deep generator networks , 2016, NIPS.

[95] Carlos Guestrin,et al. "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[96] Oluwasanmi Koyejo,et al. Interpreting Black Box Predictions using Fisher Kernels , 2018, AISTATS.

[97] Nicholas Daniloff. Suzanne Massie, Trust but Verify: Reagan, Russia, and Me. Rockland, ME: Maine Authors Publishing, 2013. 380 pp. , 2016, Journal of Cold War Studies.

[98] Daniel E. Ho,et al. Algorithmic Accountability in the Administrative State , 2020 .

[99] Been Kim,et al. Sanity Checks for Saliency Maps , 2018, NeurIPS.

[100] Carlos V. Rozas,et al. Innovative instructions and software model for isolated execution , 2013, HASP '13.

[101] Vijay Gadepally,et al. Survey of Attacks and Defenses on Edge-Deployed Neural Networks , 2019, 2019 IEEE High Performance Extreme Computing Conference (HPEC).

[102] Elizabeth Gibney. The battle for ethical AI at the world’s biggest machine-learning conference , 2020, Nature.

[103] Mary L. Gray,et al. Ghost Work: How to Stop Silicon Valley from Building a New Global Underclass , 2019 .

[104] David Maxwell Chickering,et al. ModelTracker: Redesigning Performance Analysis Tools for Machine Learning , 2015, CHI.

[105] Barak Y. Orbach. The Antitrust Consumer Welfare Paradox , 2011 .

[106] Quanshi Zhang,et al. Interpreting CNN knowledge via an Explanatory Graph , 2017, AAAI.

[107] Wolfgang Ziegler,et al. An Introduction To Reasoning , 2016 .

[108] N. Whitman. A bitter lesson. , 1999, Academic medicine : journal of the Association of American Medical Colleges.

[109] M. de Rijke,et al. Finding Influential Training Samples for Gradient Boosted Decision Trees , 2018, ICML.

[110] Dan Boneh,et al. Slalom: Fast, Verifiable and Private Execution of Neural Networks in Trusted Hardware , 2018, ICLR.

[111] Daniel S. Hoadley,et al. Artificial Intelligence and National Security , 2018 .

[112] Kush R. Varshney,et al. Increasing Trust in AI Services through Supplier's Declarations of Conformity , 2018, IBM J. Res. Dev..

[113] Michael Naehrig,et al. CryptoNets: applying neural networks to encrypted data with high throughput and accuracy , 2016, ICML 2016.

[114] Ken Rutherford,et al. The Evolving Arms Control Agenda: Implications of the Role of NGOs in Banning Antipersonnel Landmines , 2000 .

[115] Anna Jobin,et al. The global landscape of AI ethics guidelines , 2019, Nat. Mach. Intell..

[116] Gillian K. Hadfield,et al. Regulatory Markets for AI Safety , 2019, ArXiv.

[117] Bernard Brodie. HEDLEY BULL. The Control of the Arms Race: Disarmament and Arms Control in the Missile Age. (Studies in Inter national Security, II.) Pp. 215. New York: Frederick A. Praeger for the Institute for Strategic Studies, 1961. $3.95 , 1962 .

[118] Emily Chen,et al. How do Humans Understand Explanations from Machine Learning Systems? An Evaluation of the Human-Interpretability of Explanation , 2018, ArXiv.

[119] Joan Bruna,et al. Intriguing properties of neural networks , 2013, ICLR.

[120] J. Reidenberg,et al. Accountable Algorithms , 2016 .

[121] Emanuel Adler,et al. The emergence of cooperation: national epistemic communities and the international evolution of the idea of nuclear arms control , 1992, International Organization.

[122] Inioluwa Deborah Raji,et al. Model Cards for Model Reporting , 2018, FAT.

[123] Thomas Brox,et al. Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[124] Yang Liu,et al. Actionable Recourse in Linear Classification , 2018, FAT.

[125] William E. Kovacic,et al. Antitrust Policy: A Century of Economic and Legal Thinking , 1999 .

[126] Timnit Gebru,et al. Datasheets for datasets , 2018, Commun. ACM.

[127] Andrea Vedaldi,et al. Interpretable Explanations of Black Boxes by Meaningful Perturbation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[128] Bolei Zhou,et al. GAN Dissection: Visualizing and Understanding Generative Adversarial Networks , 2018, ICLR.

[129] Krass. Politics of verification , 2009 .

[130] Rachel Cummings,et al. The Role of Differential Privacy in GDPR Compliance , 2018 .

[131] Thomas G. Dietterich. Robust artificial intelligence and robust human organizations , 2018, Frontiers of Computer Science.

[132] Torsten Hoefler,et al. Scientific benchmarking of parallel computing systems: twelve ways to tell the masses when reporting performance results , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[133] Michael A. Osborne,et al. The future of employment: How susceptible are jobs to computerisation? , 2017 .

[134] ANDREW J. COE,et al. Why Arms Control Is So Rare , 2019, American Political Science Review.

[135] Carol Barner-Barry,et al. The Diplomacy of Biological Disarmament: Vicissitudes of a Treaty in Force. Nicholas A. Sims, New York: St. Martin's Press, 1988 , 1990 .

[136] Jure Leskovec,et al. Interpretable & Explorable Approximations of Black Box Models , 2017, ArXiv.

[137] Aleksander Madry,et al. Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[138] Heidy Khlaaf,et al. Disruptive Innovations and Disruptive Assurance: Assuring Machine Learning and Autonomy , 2019, Computer.

[139] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[140] J. Zico Kolter,et al. Certified Adversarial Robustness via Randomized Smoothing , 2019, ICML.

[141] Zahra Ghodsi,et al. SafetyNets: Verifiable Execution of Deep Neural Networks on an Untrusted Cloud , 2017, NIPS.

[142] Andrea Vedaldi,et al. Deep Image Prior , 2017, International Journal of Computer Vision.

[143] Unoda,et al. 75/52 Implementation of the convention on the prohibition of the use, stockpiling, production and transfer of anti personnel mines and on their destruction , 2021, The United Nations Disarmament Yearbook 2020: Part I.

[144] Jozef Goldblat,et al. Agreements for Arms Control , 2020 .

[145] Menno D. T. de Jong,et al. The privacy paradox - Investigating discrepancies between expressed privacy concerns and actual online behavior - A systematic literature review , 2017, Telematics Informatics.

[146] Quanshi Zhang,et al. Interpretable Convolutional Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[147] Robert Jervis,et al. Arms Control, Stability, and Causes of War , 1993 .

[148] K. Payne. Artificial Intelligence: A Revolution in Strategic Affairs? , 2018, Survival.

[149] Hin-Yan Liu,et al. Autonomous Weapons Systems: Law, Ethics, Policy , 2016 .

[150] Yarin Gal,et al. Real Time Image Saliency for Black Box Classifiers , 2017, NIPS.

[151] Dumitru Erhan,et al. The (Un)reliability of saliency methods , 2017, Explainable AI.

[152] Andrea Vedaldi,et al. Salient Deconvolutional Networks , 2016, ECCV.

[153] Farinaz Koushanfar,et al. DeepAttest: An End-to-End Attestation Framework for Deep Neural Networks , 2019, 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA).

[154] Donald G. Brennan,et al. Arms control, disarmament, and national security , 1961 .

[155] Philip Koopman,et al. Robustness Testing of Autonomy Software , 2017, 2018 IEEE/ACM 40th International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP).

[156] Michael Gao,et al. "The human body is a black box": supporting clinical decision-making with deep learning , 2019, FAT*.

[157] Daniel G. Goldstein,et al. Manipulating and Measuring Model Interpretability , 2018, CHI.

[158] Tim Hwang,et al. Computational Power and the Social Impact of Artificial Intelligence , 2018, ArXiv.

[159] Osbert Bastani,et al. Interpreting Blackbox Models via Model Extraction , 2017, ArXiv.

[160] Reuben Binns,et al. Fairness in Machine Learning: Lessons from Political Philosophy , 2017, FAT.

[161] Steven M. Drucker,et al. Gamut: A Design Probe to Understand How Data Scientists Understand Machine Learning Models , 2019, CHI.

[162] Jan Vitek,et al. Repeatability, reproducibility and rigor in systems research , 2011, 2011 Proceedings of the Ninth ACM International Conference on Embedded Software (EMSOFT).

[163] Luca Pulina,et al. An Abstraction-Refinement Approach to Verification of Artificial Neural Networks , 2010, CAV.

[164] Percy Liang,et al. Understanding Black-box Predictions via Influence Functions , 2017, ICML.

[165] Şerif Onur Bahçecik,et al. Civil Society Responds to the AWS : Growing Activist Networks and Shifting Frames , 2019, Global Policy.

[166] Zachary Chase Lipton. The mythos of model interpretability , 2016, ACM Queue.

[167] Alec Radford,et al. Release Strategies and the Social Impacts of Language Models , 2019, ArXiv.

[168] Andrea Vedaldi,et al. Understanding Deep Networks via Extremal Perturbations and Smooth Masks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[169] Oren Etzioni,et al. Green AI , 2019, Commun. ACM.

[170] Jennifer Cobbe,et al. Monitoring Misuse for Accountable 'Artificial Intelligence as a Service' , 2020, AIES.

[171] Cynthia Dwork,et al. Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.