Adapting the Weka Data Mining Toolkit to a Grid Based Environment

Data Mining is playing a key role in most enterprises, which have to analyse great amounts of data in order to achieve higher profits. Nevertheless, due to the large datasets involved in this process, the data mining field must face some technological challenges. Grid Computing takes advantage of the low-load periods of all the computers connected to a network, making possible resource and data sharing. Providing Grid services constitute a flexible manner of tackling the data mining needs. This paper shows the adaptation of Weka, a widely used Data Mining tool, to a grid infrastructure.

[1]  Ian Foster,et al.  The Grid 2 - Blueprint for a New Computing Infrastructure, Second Edition , 1998, The Grid 2, 2nd Edition.

[2]  Steven Tuecke,et al.  GridFTP: Protocol Extensions to FTP for the Grid , 2001 .

[3]  Ian T. Foster,et al.  The anatomy of the grid: enabling scalable virtual organizations , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.

[4]  Anthony Rowe,et al.  InfoGrid: providing information integration for knowledge discovery , 2003, Inf. Sci..

[5]  Mario Cannataro,et al.  The knowledge grid , 2003, CACM.

[6]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[7]  Rizos Sakellariou,et al.  Euro-Par 2001 Parallel Processing , 2001, Lecture Notes in Computer Science.

[8]  Ami Marowka,et al.  The GRID: Blueprint for a New Computing Infrastructure , 2000, Parallel Distributed Comput. Pract..

[9]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[10]  María S. Pérez-Hernández,et al.  Improving Distributed Data Mining Techniques by Means of a Grid Infrastructure , 2004, OTM Workshops.

[11]  Sadaaki Miyamoto,et al.  Rough Sets and Current Trends in Computing , 2012, Lecture Notes in Computer Science.

[12]  Jesús Carretero,et al.  An Optimization of Apriori Algorithm through the Usage of Parallel I/O and Hints , 2002, Rough Sets and Current Trends in Computing.

[13]  Juan Ángel Pastor Franco,et al.  Real Time Teaching with Java: JPR3 , 2004 .

[14]  Mohammed J. Zaki,et al.  A Requirements Analysis for Parallel KDD Systems , 2000, IPDPS Workshops.