Distributed data mining and agents

Multi-agent systems (MAS) offer an architecture for distributed problem solving. Distributed data mining (DDM) algorithms focus on one class of such distributed problem solving tasks-analysis and modeling of distributed data. This paper offers a perspective on DDM algorithms in the context of multi-agents systems. It discusses broadly the connection between DDM and MAS. It provides a high-level survey of DDM, then focuses on distributed clustering algorithms and some potential applications in multi-agent-based problem solving scenarios. It reviews algorithms for distributed clustering, including privacy-preserving ones. It describes challenges for clustering in sensor-network environments, potential shortcomings of the current algorithms, and future work accordingly. It also discusses confidentiality (privacy preservation) and presents a new algorithm for privacy-preserving density-based clustering.

[1]  Alexandre V. Evfimievski,et al.  Limiting privacy breaches in privacy preserving data mining , 2003, PODS.

[2]  Chris Clifton,et al.  Tools for privacy preserving distributed data mining , 2002, SKDD.

[3]  Nicolas Monmarché,et al.  A new clustering algorithm based on the chemical recognition system of ants , 2002 .

[4]  Matthias Klusch,et al.  Inference Attacks in Peer-to-Peer Homogeneous Distributed Data Mining , 2004, ECAI.

[5]  Leen-Kiat Soh,et al.  Reflective Negotiating Agents for Real-Time Multisensor Target Tracking , 2001, IJCAI.

[6]  Stamatis Vassiliadis,et al.  A peer-to-peer agent auction , 2002, AAMAS '02.

[7]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[8]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[9]  Chris Clifton,et al.  Secure set intersection cardinality with application to association rule mining , 2005, J. Comput. Secur..

[10]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[11]  Kun Liu,et al.  VEDAS: A Mobile and Distributed Data Stream Mining System for Real-Time Vehicle Monitoring , 2004, SDM.

[12]  Munindar P. Singh,et al.  Emergence of agent-based referral networks , 2002, AAMAS '02.

[13]  Hans-Peter Kriegel,et al.  DBDC: Density Based Distributed Clustering , 2004, EDBT.

[14]  Bhavani M. Thuraisingham,et al.  Data mining, national security, privacy and civil liberties , 2002, SKDD.

[15]  Zoran Obradovic,et al.  Distributed clustering and local regression for knowledge discovery in multiple spatial databases , 2000, ESANN.

[16]  Hein Meling,et al.  Anthill: a framework for the development of agent-based peer-to-peer systems , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.

[17]  Hillol Kargupta,et al.  Collective, Hierarchical Clustering from Distributed, Heterogeneous Data , 1999, Large-Scale Parallel Data Mining.

[18]  Chris Clifton,et al.  Using Sample Size to Limit Exposure to Data Mining , 2000, J. Comput. Secur..

[19]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[20]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2002, Journal of Cryptology.

[21]  Qi Wang,et al.  Random-data perturbation techniques and privacy-preserving data mining , 2005, Knowledge and Information Systems.

[22]  Bin Zhang,et al.  Distributed data clustering can be efficient and exact , 2000, SKDD.

[23]  Nong Ye,et al.  The Handbook of Data Mining , 2003 .

[24]  Joydeep Ghosh,et al.  Privacy-preserving distributed clustering using generative models , 2003, Third IEEE International Conference on Data Mining.

[25]  Vijay V. Raghavan,et al.  A methodology for hiding knowledge in databases , 2002 .

[26]  Stanley R. M. Oliveira,et al.  Privacy-Preserving Clustering by Object Similarity-Based Representation and Dimensionality Reduction Transformation , 2004 .

[27]  Graham Clarke,et al.  A Multi-Agent Architecture For Intelligent Building Sensing and Control , 1999 .

[28]  Anil K. Jain,et al.  Combining multiple weak clusterings , 2003, Third IEEE International Conference on Data Mining.

[29]  Stanley Robson de Medeiros Oliveira,et al.  Privacy preserving frequent itemset mining , 2002 .

[30]  Daniel A. Keim,et al.  An Efficient Approach to Clustering in Large Multimedia Databases with Noise , 1998, KDD.

[31]  Chris Clifton,et al.  Privacy-preserving distributed mining of association rules on horizontally partitioned data , 2004, IEEE Transactions on Knowledge and Data Engineering.

[32]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[33]  Ana L. N. Fred,et al.  Data clustering using evidence accumulation , 2002, Object recognition supported by user interaction for service robots.

[34]  Jaideep Vaidya,et al.  Privacy preserving association rule mining in vertically partitioned data , 2002, KDD.

[35]  Wenliang Du,et al.  Building decision tree classifier on private data , 2002 .

[36]  Matthias Klusch,et al.  Distributed Clustering Based on Sampling Local Density Estimates , 2003, IJCAI.

[37]  Chris Clifton,et al.  Privacy-preserving k-means clustering over vertically partitioned data , 2003, KDD '03.

[38]  Elisa Bertino,et al.  Hiding Association Rules by Using Confidence and Support , 2001, Information Hiding.

[39]  Chris Clifton,et al.  SECURITY AND PRIVACY IMPLICATIONS OF DATA MINING , 1996 .

[40]  Hillol Kargupta,et al.  Distributed Data Mining: Algorithms, Systems, and Applications , 2003 .

[41]  Randy H. Katz,et al.  Next century challenges: mobile networking for “Smart Dust” , 1999, MobiCom.

[42]  Vassilios S. Verykios,et al.  Disclosure limitation of sensitive rules , 1999, Proceedings 1999 Workshop on Knowledge and Data Engineering Exchange (KDEX'99) (Cat. No.PR00453).

[43]  Sushil Jajodia,et al.  Inference Problems in Multilevel Secure Database Management Systems , 2006 .

[44]  Yelena Yesha,et al.  Data Mining: Next Generation Challenges and Future Directions , 2004 .

[45]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[46]  Philip K. Chan,et al.  Advances in Distributed and Parallel Knowledge Discovery , 2000 .

[47]  Chris Clifton,et al.  Using unknowns to prevent discovery of association rules , 2001, SGMD.

[48]  Jayant R. Haritsa,et al.  Maintaining Data Privacy in Association Rule Mining , 2002, VLDB.

[49]  Mohammed J. Zaki Parallel and distributed association mining: a survey , 1999, IEEE Concurr..

[50]  Mohammed J. Zaki Parallel and Distributed Data Mining: An Introduction , 1999, Large-Scale Parallel Data Mining.

[51]  Matthias Klusch,et al.  Agent-Based Distributed Data Mining: The KDEC Scheme , 2003, AgentLink.

[52]  Alexandre V. Evfimievski,et al.  Privacy preserving mining of association rules , 2002, Inf. Syst..

[53]  Sushil Jajodia,et al.  The inference problem: a survey , 2002, SKDD.

[54]  Rakesh Agrawal,et al.  Privacy-preserving data mining , 2000, SIGMOD 2000.

[55]  Wolfgang Müller,et al.  Classifying Documents by Distributed P2P Clustering , 2003, GI Jahrestagung.

[56]  Hillol Kargupta,et al.  Distributed Clustering Using Collective Principal Component Analysis , 2001, Knowledge and Information Systems.

[57]  Inderjit S. Dhillon,et al.  A Data-Clustering Algorithm on Distributed Memory Multiprocessors , 1999, Large-Scale Parallel Data Mining.

[58]  Foster Provost,et al.  Distributed Data Mining: Scaling up and beyond , 2000 .

[59]  Nagiza F. Samatova,et al.  RACHET: An Efficient Cover-Based Merging of Clustering Hierarchies from Distributed Datasets , 2002, Distributed and Parallel Databases.

[60]  Benny Pinkas,et al.  Cryptographic techniques for privacy-preserving data mining , 2002, SKDD.

[61]  Yücel Saygin,et al.  Privacy preserving association rule mining , 2002, Proceedings Twelfth International Workshop on Research Issues in Data Engineering: Engineering E-Commerce/E-Business Systems RIDE-2EC 2002.

[62]  Daniele Nardi,et al.  Design and evaluation of multi agent systems for rescue operations , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[63]  Elisa Bertino,et al.  State-of-the-art in privacy preserving data mining , 2004, SGMD.

[64]  S. Jajodia,et al.  Information Security: An Integrated Collection of Essays , 1994 .