Distributed learning: regression on attribute-distributed data and consensus clustering

This dissertation is a compilation of four different studies that are united by their relevance to attribute-distributed learning, both supervised (regression) and unsupervised (clustering). Regression on attribute-distributed data is first discussed. The theoretical performance limits of a linear ensemble estimator are investigated, and an iterative training protocol with low test error and high robustness to irrelevant agents is proposed. Motivated by quantifying the trade-off between communication and performance in regression on attribute-distributed data, an iterative training algorithm based on inaccurate estimates of the covariance matrix of individual training residuals is designed, and tested under different amounts of data-exchange. In order to reduce data exchange and the negative influence of irrelevant agents, an intelligent agent selection algorithm based on heuristics is proposed and tested. Finally, motivated partly by solving attribute-distributed clustering problems, a computationally efficient algorithm, the Filtered Stochastic Best-Multiple-Element-Move (BMEM) algorithm, is designed and investigated, which provides superior computational efficiency as well as better final results compared to other local search algorithms for consensus clustering.