Collective Data Mining: a New Perspective toward Distributed Data Mining Advances in Distributed Data Mining Book

This paper introduces the collective data mining (CDM), a new approach toward distributed data mining (DDM) from heterogeneous sites. It points out that naive approaches to distributed data analysis in a heterogeneous environment may face ambiguous situation and may lead to incorrect global data model. It also observes that any function can be expressed in a distributed fashion using a set of appropriate basis functions and orthonormal basis functions can be eeectively used for developing a general framework for DDM that guarantees correct local analysis, resulting in desired global data model using minimal data communication. The paper develops the foundation of CDM, discusses decision tree learning and polynomial regression in CDM for discrete and continuous variables, and describes the BODHI, a CDM based experimental system.

[1]  David Thomas,et al.  The Art in Computer Programming , 2001 .

[2]  C. Mulcahy,et al.  Image Compression Using Haar Wavelet Transform , 2011 .

[3]  Salvatore J. Stolfo,et al.  Toward Scalable Learning with Non-Uniform Class and Cost Distributions: A Case Study in Credit Card Fraud Detection , 1998, KDD.

[4]  Victor R. Lesser,et al.  Problem structure and subproblem sharing in multi-agent systems , 1998, Proceedings International Conference on Multi Agent Systems (Cat. No.98EX160).

[5]  Filippo Menczer,et al.  Adaptive information agents in distributed textual environments , 1998, AGENTS '98.

[6]  Chris Nowak,et al.  Multiple Databases, Partial Reasoning, and Knowledge Discovery , 1998, PAKDD.

[7]  Vincent Cho,et al.  Towards Real Time Discovery from Distributed Information Sources , 1998, PAKDD.

[8]  Robert L. Grossman,et al.  The Preliminary Design of Papyrus: A System for High Performance Distributed Data Mining over Cluste , 1998, AAAI 1998.

[9]  Wenke Lee,et al.  A Data Mining Framework for Adaptive Intrusion Detection ∗ , 1998 .

[10]  Salvatore J. Stolfo,et al.  JAM: Java Agents for Meta-Learning over Distributed Databases , 1997, KDD.

[11]  Yike Guo,et al.  Parallel Induction Algorithms for Data Mining , 1997, IDA.

[12]  Kenji Yamanishi,et al.  Distributed cooperative Bayesian learning strategies , 1997, COLT '97.

[13]  Alexandros Moukas Amalthaea Information Discovery and Filtering Using a Multiagent Evolving Ecosystem , 1997, Appl. Artif. Intell..

[14]  Wai Lam,et al.  Distributed data mining of probabilistic knowledge , 1997, Proceedings of 17th International Conference on Distributed Computing Systems.

[15]  I. Hamzaoglu H. Kargupta,et al.  Distributed Data Mining Using An Agent Based Architecture , 1997, KDD 1997.

[16]  F. Provost A Survey of Methods for Scaling Up Inductive Learning Algorithms , 1997 .

[17]  Colm Mulcahy,et al.  Plotting and Scheming with Wavelets , 1996 .

[18]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[19]  David Wai-Lok Cheung,et al.  Efficient Mining of Association Rules in Distributed Databases , 1996, IEEE Trans. Knowl. Data Eng..

[20]  Peter Edwards,et al.  The Communication of Inductive Inferences , 1996, ECAI Workshop LDAIS / ICMAS Workshop LIOME.

[21]  Foster J. Provost,et al.  Scaling Up: Distributed Machine Learning with Cooperation , 1996, AAAI/IAAI, Vol. 1.

[22]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[23]  E. J. Stollnitz,et al.  Wavelets for Computer Graphics: A Primer Part 2 , 1995 .

[24]  Peter Edwards,et al.  Distributed Learning: An Agent-Based Approach to Data-Mining , 1995 .

[25]  Timothy W. Finin,et al.  KQML as an agent communication language , 1994, CIKM '94.

[26]  M. Victor Wickerhauser,et al.  Adapted wavelet analysis from theory to software , 1994 .

[27]  E. J. Stollnitz,et al.  Wavelets for Computer Graphics : A Primer , 1994 .

[28]  Salvatore J. Stolfo,et al.  Experiments on multistrategy learning by meta-learning , 1993, CIKM '93.

[29]  Salvatore J. Stolfo,et al.  Toward parallel and distributed learning by meta-learning , 1993 .

[30]  David E. Goldberg,et al.  The Nonuniform Walsh-Schema Transform , 1990, FOGA.

[31]  Adi Shamir,et al.  A method for obtaining digital signatures and public-key cryptosystems , 1978, CACM.

[32]  Frederick Mosteller,et al.  Data Analysis and Regression , 1978 .