An agent-based framework for distributed learning

In this paper an agent-based distributed learning framework based on data reduction is proposed. Data reduction aims at finding patterns or regularities within certain features, allowing to induce the so-called prototypes which should be retained for further use during the learning process. The considered approach assumes that data reduction through instance and feature selection is carried out independently at each site by a team of agents. To assure obtaining homogenous prototypes the feature selection requires coordination. The proposed approach provides such coordination by collaboration of agents. In the process of data reduction heterogeneous prototypes can be subsequently merged to create a compact representation of the distributed data repositories and, next, based on such a compact representation a selected meta-learning technique can be applied for generating the global classifier. The paper proposes and explains strategies for agent collaboration producing a common set of features and strategies for constructing combiner classifier. Suggested strategies are evaluated experimentally and compared. The paper includes a detailed description of the proposed approaches and a discussion of the computational experiment results.

[1]  Hussein A. Abbass,et al.  Distributed Learning Classifier Systems , 2008, Learning Classifier Systems in Data Mining.

[2]  Gregory F. Cooper,et al.  Model Averaging for Prediction with Discrete Bayesian Networks , 2004, J. Mach. Learn. Res..

[3]  Nitesh V. Chawla,et al.  Creating ensembles of classifiers , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[4]  Philip K. Chan,et al.  Meta-learning in distributed data mining systems: Issues and approaches , 2007 .

[5]  Bir Bhanu,et al.  Adaptive integrated image segmentation and object recognition , 2000, IEEE Trans. Syst. Man Cybern. Part C.

[6]  Piotr Jędrzejowicz,et al.  Social learning algorithm as a tool for solving some difficult scheduling problems , 1999 .

[7]  Sung Wook Baik,et al.  Distributed Data Mining System Based on Multi-agent Communication Mechanism , 2010, KES-AMSTA.

[8]  J. Ross Quinlan,et al.  Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[9]  Matthias Klusch,et al.  Agent-Based Distributed Data Mining: The KDEC Scheme , 2003, AgentLink.

[10]  Francisco Herrera,et al.  A memetic algorithm for evolutionary prototype selection: A scaling up approach , 2008, Pattern Recognit..

[11]  Piotr Jedrzejowicz,et al.  Distributed Learning Algorithm based on Data Reduction , 2009, ICAART.

[12]  Zoran Obradovic,et al.  Performance Controlled Data Reduction for Knowledge Discovery in Distributed Databases , 2000, PAKDD.

[13]  Salvatore J. Stolfo,et al.  JAM: Java Agents for Meta-Learning over Distributed Databases , 1997, KDD.

[14]  Gregory Piatetsky-Shapiro,et al.  Knowledge Discovery in Databases: An Overview , 1992, AI Mag..

[15]  Zili Zhang,et al.  An agent-based hybrid framework for database mining , 2003, Appl. Artif. Intell..

[16]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[17]  William Frawley,et al.  Knowledge Discovery in Databases , 1991 .

[18]  Loris Nanni,et al.  Particle swarm optimization for prototype reduction , 2009, Neurocomputing.

[19]  Yves Kodratoff,et al.  Machine Learning — EWSL-91 , 1991, Lecture Notes in Computer Science.

[20]  Zoran Obradovic,et al.  Boosting Algorithms for Parallel and Distributed Learning , 2022 .

[21]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[22]  B. Brown,et al.  Concepts and Techniques , 1983 .

[23]  Kaspar Riesen,et al.  Reducing the dimensionality of dissimilarity space embedding graph kernels , 2009, Eng. Appl. Artif. Intell..

[24]  Shaul Markovitch,et al.  The COMPSET Algorithm for Subset Selection , 2005, IJCAI.

[25]  Piotr Jedrzejowicz,et al.  An Approach to Instance Reduction in Supervised Learning , 2003, SGAI Conf..

[26]  Michal Pechoucek,et al.  MALEF: Framework for distributed machine learning and data mining , 2008, Int. J. Intell. Inf. Database Syst..

[27]  Pericles A. Mitkas,et al.  Data mining for agent reasoning: A synergy for training intelligent agents , 2007, Eng. Appl. Artif. Intell..

[28]  Philip K. Chan,et al.  Advances in Distributed and Parallel Knowledge Discovery , 2000 .

[29]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[30]  Sati S. Sian,et al.  Extending Learning to Multiple Agents: Issues and a Model for Multi-Agent Machine Learning (MA-ML) , 1991, EWSL.

[31]  Ángel Fernando Kuri Morales,et al.  A search space reduction methodology for data mining in large databases , 2009, Eng. Appl. Artif. Intell..

[32]  Fang Min Novel ensemble learning based on multiple section distribution in distributed environment , 2008 .

[33]  Matthias Klusch,et al.  Distributed data mining and agents , 2005, Eng. Appl. Artif. Intell..

[34]  Thomas A. Runkler,et al.  Two cooperative ant colonies for feature selection using fuzzy models , 2010, Expert Syst. Appl..

[35]  Zili Zhang,et al.  Multiagent Framework for Bio-data Mining , 2009, RSKT.

[36]  Katharina Morik,et al.  Distributed feature extraction in a p2p setting - a case study , 2007, Future Gener. Comput. Syst..

[37]  David B. Skillicorn,et al.  Distributed prediction from vertically partitioned data , 2008, J. Parallel Distributed Comput..

[38]  David B. Skalak,et al.  Prototype and Feature Selection by Sampling and Random Mutation Hill Climbing Algorithms , 1994, ICML.

[39]  James Morgan,et al.  SAMPLE SIZE AND MODELING ACCURACY OF DECISION TREE BASED DATA MINING TOOLS , 2003 .

[40]  Miroslav Kubat,et al.  Selecting representative examples and attributes by a genetic algorithm , 2003, Intell. Data Anal..

[41]  Vasant Honavar,et al.  A Framework for Learning from Distributed Data Using Sufficient Statistics and Its Application to Learning Decision Trees , 2004, Int. J. Hybrid Intell. Syst..

[42]  Zbigniew Michalewicz,et al.  Genetic Algorithms + Data Structures = Evolution Programs , 1996, Springer Berlin Heidelberg.

[43]  Taoshen Li,et al.  Combining Distributed Classifies by Stacking , 2009, 2009 Third International Conference on Genetic and Evolutionary Computing.

[44]  Frans Coenen,et al.  EMADS: An extendible multi-agent data miner , 2009, Knowl. Based Syst..

[45]  Lefteris Angelis,et al.  Clustering classifiers for knowledge discovery from physically distributed databases , 2004, Data Knowl. Eng..

[46]  Hongjun Lu,et al.  Identifying Relevant Databases for Multidatabase Mining , 1998, PAKDD.

[47]  Longbing Cao Data Mining and Multi-agent Integration , 2009 .

[48]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[49]  Thomas R. Ioerger,et al.  Enhancing Learning using Feature and Example selection , 2003 .

[50]  H. Sivakumar,et al.  Papyrus: A System for Data Mining over Local and Wide Area Clusters and Super-Clusters , 1999, ACM/IEEE SC 1999 Conference (SC'99).