The Problem of Centralizing Distributed Data Sources in the Regression Task

In this work we present the effects of centralizing distributed data sources in order to perform automatic data analysis, without taking into account the different underlying laws of probability that these data sources could have. We compare a centralized approach and two distributed approaches for the distributed regression task. The experiments are performed on a set of synthetic and real data sets, in order to validate that the distributed approaches outperform the classic approach. The results indicate that in most cases, the centralized approach yields worse results.