论文信息 - A Generic Datamining System. Basic Design and Implementation Guidelines

A Generic Datamining System. Basic Design and Implementation Guidelines

The aim of this work is to study the engineering of a generic datamining system, being generic as it must try to integrate as many learning algorithms as possible. Meanwhile the system must be capable of generating, by means of meta-learning, a decission mechanism and so being able to decide the most adecuate algorithm for each datamining task, depending on basic features of the data set, requeriments of the user and the background knowledge adquired on previous datamining sessions. Obviously, to aaord the integration of such number of learning algorithms, the ideal processing platform must be distributed because of the system's scalabil-ity. Diierent challenges appearing are analized. The rst one is the engineering of a distributed system for assuring scalability in order to integrate a potentially large number of machine learning algorithms. Another important problem is the deeni-tion of a common functionality for all machine learning problems to ease integration and management of algorithms. However, the most important task is metalearning because algorithms and source data features, user requirements and metrics have to be formally deened. Besides, diierent machine learning performance metrics should be stated and combined.

[1] Stefan Wrobel,et al. Extensibility in Data Mining Systems , 1996, KDD.

[2] Nils J. Nilsson,et al. MLC++, A Machine Learning Library in C++. , 1995 .

[3] Salvatore J. Stolfo,et al. JAM: Java Agents for Meta-Learning over Distributed Databases , 1997, KDD.

[4] Ron Kohavi,et al. MineSet: An Integrated System for Data Mining , 1997, KDD.

[5] David J. Spiegelhalter,et al. Machine Learning, Neural and Statistical Classification , 2009 .

[6] C. Fernández. Definición de una metodología para el desarrollo de sistemas multiagente , 1998 .

[7] Ilker Hamzaoglu,et al. Scalable, Distributed Data Mining - An Agent Architecture , 1997, KDD.

[8] Carlos Angel Iglesias,et al. MIX: A General Purpose Multiagent Architecture , 1995, ATAL.

[9] Padhraic Smyth,et al. From Data Mining to Knowledge Discovery: An Overview , 1996, Advances in Knowledge Discovery and Data Mining.