Towards international standards for evaluating machine learning

Various international efforts to standardize artificial intelligence have begun, and many of these efforts involve issues related to privacy, trustworthiness, safety, and public wellbeing, which are topics that don’t necessarily have international consensus, and may not for the foreseeable future. Meanwhile, the pursuit of achieving state-of-the-art accuracy in machine learning has resulted in a somewhat ad hoc application of empirical methodology that may limit the correctness of the computation of those accuracies, resulting in unpredictable applicability of those models. Trusting the objective quantitative performance of our systems is itself a safety concern and should inform the earliest standards to-