Increasing Trust in AI Services through Supplier's Declarations of Conformity

The accuracy and reliability of machine learning algorithms are an important concern for suppliers of artificial intelligence (AI) services, but considerations beyond accuracy, such as safety, security, and provenance, are also critical elements to engender consumers' trust in a service. In this paper, we propose a supplier's declaration of conformity (SDoC) for AI services to help increase trust in AI services. An SDoC is a transparent, standardized, but often not legally required, document used in many industries and sectors to describe the lineage of a product along with the safety and performance testing it has undergone. We envision an SDoC for AI services to contain purpose, performance, safety, security, and provenance information to be completed and voluntarily released by AI service providers for examination by consumers. Importantly, it conveys product-level rather than component-level functional testing. We suggest a set of declaration items tailored to AI and provide examples for two fictitious AI services.

[1]  Paul Nemitz,et al.  Constitutional democracy and technology in the age of artificial intelligence , 2018, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[2]  Inioluwa Deborah Raji,et al.  Model Cards for Model Reporting , 2018, FAT.

[3]  Rachel K. E. Bellamy,et al.  AI Fairness 360: An Extensible Toolkit for Detecting, Understanding, and Mitigating Unwanted Algorithmic Bias , 2018, ArXiv.

[4]  Francesca Rossi,et al.  Towards Composable Bias Rating of AI Services , 2018, AIES.

[5]  Benjamin Edwards,et al.  Adversarial Robustness Toolbox v0.2.2 , 2018, ArXiv.

[6]  Martin Wistuba,et al.  Adversarial Robustness Toolbox v1.0.0 , 2018, 1807.01069.

[7]  Ahmed Hosny,et al.  The Dataset Nutrition Label: A Framework To Drive Higher Data Quality Standards , 2018, Data Protection and Privacy.

[8]  Timnit Gebru,et al.  Datasheets for datasets , 2018, Commun. ACM.

[9]  Emily M. Bender,et al.  Data Statements for NLP: Toward Mitigating System Bias and Enabling Better Science , 2018 .

[10]  Carlos Castillo,et al.  Social Data: Biases, Methodological Pitfalls, and Ethical Boundaries , 2019, Front. Big Data.

[11]  Aditi Raghunathan,et al.  Certified Defenses against Adversarial Examples , 2018, ICLR.

[12]  Min Kyung Lee Understanding perception of algorithmic decisions: Fairness, trust, and emotion in response to algorithmic management , 2018, Big Data Soc..

[13]  Kush R. Varshney,et al.  The Limits of Abstract Evaluation Metrics: The Case of Hate Speech Detection , 2017, WebSci.

[14]  Been Kim,et al.  Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.

[15]  Andrew D. Selbst Disparate Impact in Big Data Policing , 2017 .

[16]  M. Schweitzer,et al.  Who Is Trustworthy? Predicting Trustworthy Intentions and Behavior , 2017, Journal of personality and social psychology.

[17]  Kush R. Varshney,et al.  On the Safety of Machine Learning: Cyber-Physical Systems, Decision Sciences, and Data Products , 2016, Big Data.

[18]  Francesco Bonchi,et al.  Algorithmic Bias: From Discrimination Discovery to Fairness-aware Data Mining , 2016, KDD.

[19]  Hanna M. Wallach,et al.  Transparency by Conformity: A Field Experiment Evaluating Openness in Local Governments , 2016 .

[20]  Ananthram Swami,et al.  The Limitations of Deep Learning in Adversarial Settings , 2015, 2016 IEEE European Symposium on Security and Privacy (EuroS&P).

[21]  Carlos Eduardo Scheidegger,et al.  Certifying and Removing Disparate Impact , 2014, KDD.

[22]  Cynthia Rudin,et al.  Algorithms for interpretable machine learning , 2014, KDD.

[23]  Lesley K. McAllister Harnessing Private Regulation , 2014, Michigan Journal of Environmental & Administrative Law.

[24]  A. Bifet,et al.  A survey on concept drift adaptation , 2014, ACM Comput. Surv..

[25]  Daniel Port,et al.  The Value of Certifying Software Release Readiness: An Exploratory Study of Certification for a Critical System at JPL , 2013, 2013 ACM / IEEE International Symposium on Empirical Software Engineering and Measurement.

[26]  Kiri Wagstaff,et al.  Machine Learning that Matters , 2012, ICML.

[27]  J. Sims,et al.  A Brief Review of the Belmont Report , 2010, Dimensions of critical care nursing : DCCN.

[28]  Marko C. J. D. van Eekelen,et al.  A software product certification model , 2010, Software Quality Journal.

[29]  Sven Ove Hansson,et al.  Principles of engineering safety: Risk and uncertainty reduction , 2008, Reliab. Eng. Syst. Saf..

[30]  Xin Li,et al.  Why do we trust new technology? A study of initial trust formation with organizational information systems , 2008, J. Strateg. Inf. Syst..

[31]  Rob J Hyndman,et al.  Another look at measures of forecast accuracy , 2006 .

[32]  High-Level Expert Group on Artificial Intelligence – Draft Ethics Guidelines for Trustworthy AI , 2019 .

[33]  Andrew D. Selbst,et al.  Big Data's Disparate Impact , 2016 .

[34]  Kush R. Varshney,et al.  Data Science of the People , for the People , by the People : A Viewpoint on an Emerging Dichotomy , 2015 .

[35]  Niklas Möller The Concepts of Risk and Safety , 2012 .

[36]  Skipper Seabold,et al.  Statsmodels: Econometric and Statistical Modeling with Python , 2010, SciPy.

[37]  George Bernard Shaw,et al.  LONG-RANGE FORECASTING From Crystal Ball to Computer , 2010 .

[38]  Gary McGraw,et al.  An Approach for Certifying Security in Software Components , 1998 .

[39]  Harlan D. Mills,et al.  Certifying the reliability of software , 1986, IEEE Transactions on Software Engineering.

[40]  A. Maslow A Theory of Human Motivation , 1943 .