Statistical Computing and Databases: Distributed Computing Near the Data

This paper addresses the following question: “how do we fit statistical models efficiently with very large data sets that reside in databases?” Nowadays it is quite common to we encounter a situation where a very large data set is stored in a database, yet the statistical analysis is performed with a separate piece of software such as R. Usually it does not make much sense and in some cases it may not even be possible to move the data from the database manager into the statistical software in order to complete a statistical