Hypotheses, machine learning and soil mapping

Abstract Hypotheses are of major importance in scientific research. In current applications of machine learning algorithms for soil mapping the hypotheses being tested or developed are often ambiguous or undefined. Mapping soil properties or classes, however, does not tell much about the dynamics and processes that underly soil genesis and evolution. When the interest in the soil map is for applications in a context different than soil science, such as for policy making or baseline production of quantitative soil information, the interpretation should be made in light of this application. If otherwise, we recommend soil scientists to provide hypotheses to accompany their research. The hypothesis is formulated at the beginning of the research and, in some cases, motivates data collection. Here we argue that when applying data-driven techniques such as machine learning, developing hypotheses can be a useful end point of the research. The spatial pattern predicted by the machine learning model and the correlation found among the covariates are an opportunity to develop hypotheses which are likely to require additional analyses and datasets to be tested. Systematically providing scientific hypotheses in digital soil mapping studies will enable the soil science community to build on previous work, and to increase the credibility of data-driven algorithms as a means to accelerate discovery on soil processes.