Meta-learning in grid-based data mining systems

The Weka4GML framework has been designed to meet the requirements of distributed data mining. In this paper, we present the Weka4GML architecture based on WSRF technology for developing meta-learning methods to deal with datasets distributed among data grid. This framework extends the Weka toolkit to support distributed execution of data mining methods, like meta-learning. The architecture and the behaviour of the proposed framework are described in this paper. We also detail the different steps needed to execute a meta-learning process on a Globus environment. Finally, the framework has been discussed and compared to related works.

[1]  Mohammed J. Zaki,et al.  Large-Scale Parallel Data Mining , 2002, Lecture Notes in Computer Science.

[2]  Fan Xue-feng Web Services Composition Based on BPEL4WS , 2005 .

[3]  María S. Pérez-Hernández,et al.  Adapting the Weka Data Mining Toolkit to a Grid Based Environment , 2005, AWIC.

[4]  Werner Dubitzky Data Mining in Grid Computing Environments , 2009 .

[5]  Philip K. Chan,et al.  Advances in Distributed and Parallel Knowledge Discovery , 2000 .

[6]  Mario Cannataro,et al.  KNOWLEDGE GRID An Architecture for Distributed Knowledge Discovery , 2002 .

[7]  Mario Cannataro,et al.  The knowledge grid , 2003, CACM.

[8]  Nancy Wilkins-Diehr,et al.  TeraGrid: Analysis of Organization, System Architecture, and Middleware Enabling New Types of Applications , 2006, High Performance Computing Workshop.

[9]  Antonio Congiusta,et al.  Parallel, Distributed, and Grid-Based Data Mining: Algorithms, Systems, and Applications , 2009 .

[10]  Philip K. Chan,et al.  Meta-learning in distributed data mining systems: Issues and approaches , 2007 .

[11]  Ian Foster,et al.  The Grid 2 - Blueprint for a New Computing Infrastructure, Second Edition , 1998, The Grid 2, 2nd Edition.

[12]  Jason Maassen,et al.  Programming Scientific and Distributed Workflow with Triana Services , 2004 .

[13]  Domenico Talia,et al.  Distributed data mining services leveraging WSRF , 2007, Future Gener. Comput. Syst..

[14]  Bernard Toursel,et al.  Distributed Data Mining , 2001, Scalable Comput. Pract. Exp..

[15]  Ana I. González Acuña An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, Boosting, and Randomization , 2012 .

[16]  Darrel E. Bostow,et al.  An experimental comparison of three methods of instruction in health education for cancer prevention: traditional paper prose text, passive non-interactive computer presentation and overt-interactive computer presentation , 1992 .

[17]  Ian Witten,et al.  Data Mining , 2000 .

[18]  Yahya Slimani,et al.  WSRF services for learning classifiers from Data Grid , 2009, 2009 IEEE/ACS International Conference on Computer Systems and Applications.

[19]  María S. Pérez-Hernández,et al.  Improving Distributed Data Mining Techniques by Means of a Grid Infrastructure , 2004, OTM Workshops.

[20]  Domenico Talia,et al.  Weka4WS: A WSRF-Enabled Weka Toolkit for Distributed Data Mining on Grids , 2005, PKDD.

[21]  Grigorios Tsoumakas,et al.  Distributed Data Mining , 2009, Encyclopedia of Data Warehousing and Mining.

[22]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[23]  Ian J. Taylor,et al.  Web services composition for distributed data mining , 2005, 2005 International Conference on Parallel Processing Workshops (ICPPW'05).

[24]  Charng-da Lu,et al.  Application Tuning and Adaptation , 2004, The Grid 2, 2nd Edition.

[25]  R. V. van Nieuwpoort,et al.  The Grid 2: Blueprint for a New Computing Infrastructure , 2003 .