Upscaling in Situ Site-Based Albedo Using Machine Learning Models: Main Controlling Factors on Results

Validation of satellite albedo products is an essential step because their quantitative application lie in their ability to record the real state of the earth surface. Upscaling in situ measurements to the corresponding pixel scale is necessary due to the spatial scale mismatch between in situ and satellite measurements. Machine learning-based models have been increasingly used for upscaling because they can yield more reliable results than traditional methods. Nevertheless, the main controlling factors on upscaled results have rarely been discussed. This article explores the control factors that bring uncertainties to the upscaled results based on machine learning models. Three machine learning models, including random forest (RF), $k$ -nearest neighbor (KNN), and Cubist models, were selected to upscale single site in situ-based albedo to the coarse pixel scale. The upscaled results were carefully assessed through comparison with pixel scale albedo reference. The results indicate that the accuracy of upscaled results depends on the machine learning models, the inclusion of key variables related to albedo, the dataset selection of these variables, the amount of training data, and the sensitivity of machine learning models to these factors. Despite the dependence on control factors, the machine learning-based upscaling methods generally have excellent applicability across different spatial scales and over other untrained areas. Therefore, they open the door to generating a time series of globally, spatially continuous distributed reference datasets with sufficient length, consistency, and continuity to adequately fulfill the requirement of a comprehensive validation.