When Trusted Black Boxes Don't Agree: Incentivizing Iterative Improvement and Accountability in Critical Software Systems

Software increasingly plays a key role in regulated areas like housing, hiring, and credit, as well as major public functions such as criminal justice and elections. It is easy for there to be unintended defects with a large impact on the lives of individuals and society as a whole. Preventing, finding, and fixing software defects is a key focus of both industrial software development efforts as well as academic research in software engineering. In this paper, we discuss flaws in the larger socio-technical decision-making processes in which critical black-box software systems are developed, deployed, and trusted. We use criminal justice software, specifically probabilistic genotyping (PG) software, as a concrete example. We describe how PG software systems, designed to do the same job, produce different results. We highlight the under-appreciated impact of changes in key parameters and the disparate impact that one such parameter can have on different racial/ethnic groups. We propose concrete changes to the socio-technical decision-making processes surrounding the use of PG software that could be used to incentivize iterative improvements in the accuracy, fairness, reliability, and accountability of these systems.