Bias, noise, and interpretability in machine learning

Abstract In machine learning studies of brain disorders, noise and bias are two major factors which can have a significant impact on the performance of the model and its interpretability. In this chapter, we first provide a theoretical overview of how the decisions made at each step in the machine learning pipeline—particularly the choice of target and predictor variables—can influence the degree of bias and noise. We then discuss different approaches to data processing that can be used to minimize the effects of bias and noise and improve the performance and interpretability of the model. Finally, we illustrate some of these approaches using relevant examples from the existing literature, focusing on three emerging themes in brain disorders research: prediction of brain age from neuroimaging data, using language as marker of illness, and the use of clinical variables for prognostic prediction. We conclude by providing a set of recommendations for future studies.