Face Recognition Vendor Test (FRVT) Part 8: Summarizing Demographic Differentials

In 2019, NIST Interagency Report 8280 quantified and visualized demographic variations for many face recognition algorithms. The report also suggested various mitigations, one of which - the focus of this report - was to define summary inequity measures that developers can work to improve and which can guide algorithm selection. Since 2019, it has become apparent that false negative inequities are substantially due to poor photography of certain groups including under-exposure of dark-skinned individiuals, and that this can be addressed by using algorithms more tolerant of poor image quality or, better, by correcting the capture process with superior cameras, imaging environments and human-factors. At the same time, it is also clear that the much larger false positive variations, which occur even in high-quality photographs, must be mitigated by algorithm developers. To those ends, this report compiles and analyzes various demographic summary measures for how face recognition false positive and false negative error rates differ across age, sex, and race-based demographic groups. We exercise some of the proposed measures by tabulating them for many algorithms submitted to the one-to-one comparison track of the Face Recognition Vendor Test. Those results appear on a regularly updated public webpage. This document is open for public comment until 2022-09-14. Comments should be directed to frvt@nist.gov. The measures in this report, and potentially others, are being considered for inclusion in the ISO/IEC 19795-10 standard on measurement and reporting of demographic effects in biometric systems, in particular towards forming standardized measures of disparate impact. European algorithm Chinese one

[1]  John J. Howard,et al.  Evaluating Proposed Fairness Models for Face Recognition Algorithms , 2022, ICPR Workshops.

[2]  C. Busch,et al.  The Watchlist Imbalance Effect in Biometric Face Identification: Comparing Theoretical Estimates and Empiric Measurements , 2021, 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW).

[3]  Tiago de Freitas Pereira,et al.  Fairness in Biometrics: A Figure of Merit to Assess Biometric Verification Systems , 2020, IEEE Transactions on Biometrics, Behavior, and Identity Science.

[4]  John J. Howard,et al.  Quantifying the Extent to Which Race and Gender Features Determine Identity in Commercial Face Recognition Algorithms , 2020, ArXiv.

[5]  John J. Howard,et al.  The Effect of Broad and Specific Demographic Homogeneity on the Imposter Distributions and False Match Rates in Face Recognition Algorithm Performance , 2019, 2019 IEEE 10th International Conference on Biometrics Theory, Applications and Systems (BTAS).

[6]  John J. Howard,et al.  Demographic Effects in Facial Recognition and Their Dependence on Image Acquisition: An Evaluation of Eleven Commercial Systems , 2019, IEEE Transactions on Biometrics, Behavior, and Identity Science.

[7]  Inioluwa Deborah Raji,et al.  Actionable Auditing: Investigating the Impact of Publicly Naming Biased Performance Results of Commercial AI Products , 2019, AIES.

[8]  Patrick J. Grother,et al.  Face Recognition Vendor Test (FRVT) - Performance of Automated Gender Classification Algorithms , 2015 .

[9]  Jamie Sherrah,et al.  False alarm rate: a critical performance measure for face recognition , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[10]  Joy Buolamwini Gender shades : intersectional phenotypic and demographic evaluation of face datasets and gender classifiers , 2017 .

[11]  F. Scholz,et al.  Confidence Bounds & Intervals for Parameters Relating to the Binomial, Negative Binomial, Poisson and Hypergeometric Distributions With Applications to Rare Events , 2007 .