Privacy-preserving agent-based distributed data clustering

A growing number of applications in distributed environment involve very large data sets that are inherently distributed among a large number of autonomous sources over a network. The demand to extend data mining technology to such distributed data sets has motivated the development of several approaches to distributed data mining and knowledge discovery, of which only a few make use of agents. We briefly review existing approaches and argue for the potential added value of using agent technology in the domain of knowledge discovery, discussing both issues and benefits. We also propose an approach to distributed data clustering, outline its agent-oriented implementation, and examine potential privacy violating attacks which agents may incur.

[1]  Bennet S. Yee A Sanctuary for Mobile Agents , 2001, Secure Internet Programming.

[2]  Joydeep Ghosh,et al.  Privacy-preserving distributed clustering using generative models , 2003, Third IEEE International Conference on Data Mining.

[3]  Claudio Sartori,et al.  Incremental maintenance of multi-source views , 2001, Proceedings 12th Australasian Database Conference. ADC 2001.

[4]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[5]  Munindar P. Singh,et al.  An Agent-Based Approach for Trustworthy Service Location , 2002, AP2PC.

[6]  Chris Clifton,et al.  Tools for privacy preserving distributed data mining , 2002, SKDD.

[7]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[8]  H. Sivakumar,et al.  Papyrus: A System for Data Mining over Local and Wide Area Clusters and Super-Clusters , 1999, ACM/IEEE SC 1999 Conference (SC'99).

[9]  I. Hamzaoglu H. Kargupta,et al.  Distributed Data Mining Using An Agent Based Architecture , 1997, KDD 1997.

[10]  Daniel A. Keim,et al.  An Efficient Approach to Clustering in Large Multimedia Databases with Noise , 1998, KDD.

[11]  Sushil Jajodia,et al.  The inference problem: a survey , 2002, SKDD.

[12]  Aris M. Ouksel,et al.  Merging G-Grid P2P Systems While Preserving Their Autonomy , 2004, P2PKM.

[13]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM '01.

[14]  Matthias Klusch,et al.  Inference Attacks in Peer-to-Peer Homogeneous Distributed Data Mining , 2004, ECAI.

[15]  Matthias Klusch,et al.  Distributed Clustering Based on Sampling Local Density Estimates , 2003, IJCAI.

[16]  Kurt Rothermel,et al.  Disseminating mobile agents for distributed information filtering , 1999, Proceedings. First and Third International Symposium on Agent Systems Applications, and Mobile Agents.

[17]  Philip S. Yu,et al.  Data Mining: An Overview from a Database Perspective , 1996, IEEE Trans. Knowl. Data Eng..

[18]  Heikki Mannila,et al.  Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining , 1997 .

[19]  Mike P. Papazoglou,et al.  Cooperative Information Systems: Trends and Directions , 1997 .

[20]  Matthias Klusch,et al.  Information agent technology for the Internet: A survey , 2001, Data Knowl. Eng..

[21]  Sandip Sen Adaptive Choice of Information Sources (Extended Abstract) , 1998, CIA.

[22]  Gregory Piatetsky-Shapiro,et al.  Advances in Knowledge Discovery and Data Mining , 2004, Lecture Notes in Computer Science.

[23]  Aris M. Ouksel,et al.  G-Grid: A Class of Scalable and Self-Organizing Data Structures for Multi-dimensional Querying and Content Routing in P2P Networks , 2003, AP2PC.

[24]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[25]  Michael Wooldridge,et al.  Intelligent Agents: The Key Concepts , 2001, Multi-Agent-Systems and Applications.

[26]  Timothy W. Finin,et al.  KQML as an agent communication language , 1994, CIKM '94.

[27]  Dimitris K. Tasoulis,et al.  Unsupervised distributed clustering , 2004, Parallel and Distributed Computing and Networks.

[28]  Matthias Klusch,et al.  The role of agents in distributed data mining: issues and benefits , 2003, IEEE/WIC International Conference on Intelligent Agent Technology, 2003. IAT 2003..

[29]  P. J. Green,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[30]  Pavel Pudil,et al.  Introduction to Statistical Pattern Recognition , 2006 .

[31]  Heikki Mannila,et al.  Proceedings of the Third International Conference on Knowledge Discovery and Data Mining , 1997 .

[32]  Salvatore J. Stolfo,et al.  JAM: Java Agents for Meta-Learning over Distributed Databases , 1997, KDD.

[33]  Christian F. Tschudin,et al.  Protecting Mobile Agents Against Malicious Hosts , 1998, Mobile Agents and Security.

[34]  Ning Zhong,et al.  Framework of a Multi-agent KDD System , 2002, IDEAL.

[35]  Philip K. Chan,et al.  Advances in Distributed and Parallel Knowledge Discovery , 2000 .

[36]  Vipin Kumar,et al.  Distributed and parallel knowledge discovery (workshop session) (title only) , 2000, Knowledge Discovery and Data Mining.

[37]  R. L. Stens,et al.  Sampling theory in Fourier and signal analysis : advanced topics , 1999 .

[38]  Munindar P. Singh,et al.  Emergence of agent-based referral networks , 2002, AAMAS '02.

[39]  Nicholas R. Jennings,et al.  Brain Meets Brawn: Why Grid and Agents Need Each Other , 2004, Towards the Learning Grid.

[40]  Victor R. Lesser,et al.  Learning Situation-Specific Coordination in Cooperative Multi-agent Systems , 1999, Autonomous Agents and Multi-Agent Systems.