Toward ubiquitous mining of distributed data

The role of data-centric information is becoming increasingly important in our everyday professional and personal lives. The advent of laptops, palmtops, handhelds, and wearable computers is also making ubiquitous access to large quantity of data possible. Advanced analysis of distributed data for extracting useful knowledge is the next natural step in the world of ubiquitous computing. However, this will not come for free; it will introduce additional cost due to communication, computational, security among others. Distributed data mining techniques offer a technology to analyze distributed data by minimizing this cost to maintain the ubiquitous presence. This paper adopts the Collective Data Mining approach that offers a collection of different scalable and distributed data analysis techniques. It particularly focuses on two collective techniques for predictive data mining, presents some experimental results, and points the readers toward more extensive documentations of the technology.

[1]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[2]  Philip K. Chan,et al.  Advances in Distributed and Parallel Knowledge Discovery , 2000 .

[3]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[4]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1990, COLT '90.

[5]  Colm Mulcahy,et al.  Plotting and Scheming with Wavelets , 1996 .

[6]  Robert E. Schapire,et al.  The strength of weak learnability , 1990, Mach. Learn..

[7]  Hillol Kargupta,et al.  Mining decision trees from data streams in a mobile environment , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[8]  Virginia Trimble,et al.  ASTROPHYSICS IN 1994 , 1995 .

[9]  C. Mulcahy,et al.  Image Compression Using Haar Wavelet Transform , 2011 .

[10]  Salvatore J. Stolfo,et al.  Cost Complexity-Based Pruning of Ensemble Classifiers , 2001, Knowledge and Information Systems.

[11]  Srinivasan Parthasarathy,et al.  Clustering Distributed Homogeneous Datasets , 2000, PKDD.

[12]  Hillol Kargupta,et al.  Collective Principal Component Analysis from Distributed, Heterogeneous Data , 2000, PKDD.

[13]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[14]  Robin Sibson,et al.  SLINK: An Optimally Efficient Algorithm for the Single-Link Cluster Method , 1973, Comput. J..

[15]  Hillol Kargupta,et al.  Distributed Multivariate Regression Using Wavelet-Based Collective Data Mining , 2001, J. Parallel Distributed Comput..

[16]  Robert L. Grossman,et al.  A Framework for Finding Distributed Data Mining Strategies That are Intermediate Between Centralized , 2000 .

[17]  Noam Nisan,et al.  Constant depth circuits, Fourier transform, and learnability , 1993, JACM.

[18]  Weiru Liu,et al.  Learning belief networks from data: an information theory based approach , 1997, CIKM '97.

[19]  Michael I. Jordan Proceedings of the NATO Advanced Study Institute on Learning in graphical models , 1998 .

[20]  Hillol Kargupta,et al.  A Fourier Analysis Based Approach to Learning Decision Trees in a Distributed Environment , 2001, SDM.

[21]  Steven Salzberg,et al.  A Decision Tree System for Finding Genes in DNA , 1998, J. Comput. Biol..

[22]  J. R. Colquhoun A Decision Tree Method of Forecasting Thunderstorms, Severe Thunderstorms and Tornadoes , 1987 .

[23]  Finn Verner Jensen,et al.  Introduction to Bayesian Networks , 2008, Innovations in Bayesian Networks.

[24]  J. E. Jackson A User's Guide to Principal Components , 1991 .

[25]  Viviane Crestana Jensen,et al.  Mining decentralized data repositories. , 2001 .

[26]  Zoran Obradovic,et al.  Distributed clustering and local regression for knowledge discovery in multiple spatial databases , 2000, ESANN.

[27]  Hillol Kargupta,et al.  Collective, Hierarchical Clustering from Distributed, Heterogeneous Data , 1999, Large-Scale Parallel Data Mining.

[28]  David J. Spiegelhalter,et al.  Local computations with probabilities on graphical structures and their application to expert systems , 1990 .

[29]  Kenji Yamanishi,et al.  Distributed cooperative Bayesian learning strategies , 1997, COLT '97.

[30]  Raj Bhatnagar,et al.  Pattern Discovery in Distributed Databases , 1997, AAAI/IAAI.

[31]  Richard L. White,et al.  DECISION TREES FOR AUTOMATED IDENTIFICATION OF COSMIC-RAY HITS IN HUBBLE SPACE TELESCOPE IMAGES , 1995 .