A Generic Datamining System. Basic Design and Implementation Guidelines

The aim of this work is to study the engineering of a generic datamining system, being generic as it must try to integrate as many learning algorithms as possible. Meanwhile the system must be capable of generating, by means of meta-learning, a decission mechanism and so being able to decide the most adecuate algorithm for each datamining task, depending on basic features of the data set, requeriments of the user and the background knowledge adquired on previous datamining sessions. Obviously, to aaord the integration of such number of learning algorithms, the ideal processing platform must be distributed because of the system's scalabil-ity. Diierent challenges appearing are analized. The rst one is the engineering of a distributed system for assuring scalability in order to integrate a potentially large number of machine learning algorithms. Another important problem is the deeni-tion of a common functionality for all machine learning problems to ease integration and management of algorithms. However, the most important task is metalearning because algorithms and source data features, user requirements and metrics have to be formally deened. Besides, diierent machine learning performance metrics should be stated and combined.