Distributed prediction from vertically partitioned data

We address the problem of prediction of data that is vertically partitioned, that is where local sites hold some of the attributes of all of the records. This situation is natural when data is collected by channels that are physically separated. For distributed prediction, we show that a technique called attribute ensembles is simple, predicts almost as well as a centralized predictor, reduces the amount of communication required, distributes computation and data access well, and allows each local site to keep its raw data private. We show how to extend attribute ensembles to data that is partitioned both horizontally and vertically.

[1]  Hillol Kargupta,et al.  Distributed Multivariate Regression Using Wavelet-Based Collective Data Mining , 2001, J. Parallel Distributed Comput..

[2]  Hillol Kargupta,et al.  Gene Expression and Fast Construction of Distributed Evolutionary Representation , 2001, Evolutionary Computation.

[3]  Anthony Skjellum,et al.  Scaling the Data Mining Step in Knowledge Discovery Using Oceanographic Data , 2000, IEA/AIE.

[4]  Salvatore J. Stolfo,et al.  Learning Arbiter and Combiner Trees from Partitioned Data for Scaling Machine Learning , 1995, KDD.

[5]  Salvatore J. Stolfo,et al.  Sharing Learned Models among Remote Database Partitions by Local Meta-Learning , 1996, KDD.

[6]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[7]  Robert F. Cahalan,et al.  The Solar Radiation and Climate Experiment (SORCE) Mission for the NASA Earth Observing System (EOS) , 2005 .

[8]  Ian F. Akyildiz,et al.  Sensor Networks , 2002, Encyclopedia of GIS.

[9]  Salvatore J. Stolfo,et al.  A Comparative Evaluation of Voting and Meta-learning on Partitioned Data , 1995, ICML.

[10]  Henning F. Harmuth,et al.  Transmission of information by orthogonal functions , 1969 .

[11]  John Anderson,et al.  Wireless sensor networks for habitat monitoring , 2002, WSNA '02.

[12]  David B. Skillicorn,et al.  Building predictors from vertically distributed data , 2004, CASCON.

[13]  Kun Liu,et al.  VEDAS: A Mobile and Distributed Data Stream Mining System for Real-Time Vehicle Monitoring , 2004, SDM.

[14]  S. Odewahn,et al.  Automated star/galaxy discrimination with neural networks , 1992 .

[15]  Rong Chen,et al.  Learning Bayesian Network Structure from Distributed Data , 2003, SDM.

[16]  Geoffrey Holmes,et al.  Benchmarking attribute selection techniques for data mining , 2000 .

[17]  Hillol Kargupta,et al.  Distributed Clustering Using Collective Principal Component Analysis , 2001, Knowledge and Information Systems.

[18]  Ujjwal Maulik,et al.  Clustering distributed data streams in peer-to-peer environments , 2006, Inf. Sci..

[19]  M. Victor Wickerhauser,et al.  Adapted wavelet analysis from theory to software , 1994 .

[20]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .