论文信息 - Pitfalls of machine learning for tail events in high risk environments

Pitfalls of machine learning for tail events in high risk environments

Most of today’s Machine Learning (ML) methods and implementations are based on correlations, in the sense of a statistical relationship between a set of inputs and the output(s) under investigation. The relationship might be obscure to the human mind, but through the use of ML, mathematics and statistics makes it seemingly apparent. However, to base safety critical decisions on such methods suffer from the same pitfalls as decisions based on any other correlation metric that disregards causality. Causality is key to ensure that applied mitigation tactics will actually affect the outcome in the desired way. This paper reviews the current situation and challenges of applying ML in high risk environments. It further outlines how phenomenological knowledge, together with an uncertainty-based risk perspective can be incorporated to alleviate the missing causality considerations in current practice. do what they do might become a serious problem as computers become more responsible for making important decisions”. The problem of accidents in ML systems is discussed in detail in a paper led by Google Brain researchers Amodei, Olah, Steinhardt, Christiano, Schulman, & Mané (2016). Here the authors motivate the increasing need to address these safety problems by some general trends, one of which relates to the increasing autonomy in AI systems. “Systems that simply output a recommendation to human users, such as speech systems, typically have relatively limited potential to cause harm. By contrast, systems that exert direct control over the world, such as machines controlling industrial processes, can cause harms in a way that humans cannot necessarily correct or oversee.” In this paper we turn our attention towards the introduction of ML in design and operation of complex engineering systems in high risk environments. These are systems where today’s methods for assessing risk relies heavily on understanding the underlying physical processes and our ability to model these. This is in contrast to e.g. mass production of components where more data-driven, statistically founded methods can be applied to estimate rates of failure. We acknowledge the need for ML technologies to address increasing complexity of engineering systems, and the challenges that follow in quantifying uncertainty and risk based on detailed numerical simulation. However, this type of application reduces the tolerance for erroneous model behavior. Most of the methods applied in ML are based