Analysis and Design of Service-Oriented Framework for Executing Data Mining Services on Grids

Data mining services on grids is the need of today’s era. Workflow environments are widely used in data mining systems to manage data and execution flows associated to complex applications. Weka, one of the most used open-source data mining systems, includes the Knowledge-Flow environment which provides a drag-and-drop inter-face to compose and execute data mining workflows. It allows users to execute a whole workflow only on a single computer on the basis of simplicity. There are several workflows in today’s scene. Most data mining workflows include several independent branches that could be run in parallel on a set of distributed machines to reduce the overall execution time. We analyzes several aspects of distributed workflow execution in Weka4WS, a framework that extends Weka and its Knowledge Flow environment to exploit distributed resources available in a Grid using Web Service technologies and also some other workflows and design which is better in efficiency and work. We also discuss several architectures prospective for betterment.

[1]  Shiyong Lu,et al.  Formal Modeling and Analysis of Scientific Workflows Using Hierarchical State Machines , 2007, Third IEEE International Conference on e-Science and Grid Computing (e-Science 2007).

[2]  Dennis Gannon,et al.  Workflows for e-Science, Scientific Workflows for Grids , 2014 .

[3]  Sriram Krishnan,et al.  Design and Evaluation of Opal2: A Toolkit for Scientific Software as a Service , 2009, 2009 Congress on Services - I.

[4]  Henri Casanova,et al.  Simgrid: a toolkit for the simulation of application scheduling , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.

[5]  Rajkumar Buyya,et al.  GridSim: a toolkit for the modeling and simulation of distributed resource management and scheduling for Grid computing , 2002, Concurr. Comput. Pract. Exp..

[6]  Rajkumar Buyya,et al.  CloudSim: A Novel Framework for Modeling and Simulation of Cloud Computing Infrastructures and Services , 2009, ArXiv.

[7]  David J. DeWitt,et al.  Scientific data management in the coming decade , 2005, SGMD.

[8]  W. Keith Edwards,et al.  Policies and roles in collaborative applications , 1996, CSCW '96.

[9]  David B. Leake,et al.  Towards Case-Based Support for e-Science Workflow Generation by Mining Provenance , 2008, ECCBR.

[10]  Mark Guzdial,et al.  Recognizing and supporting roles in CSCW , 2000, CSCW '00.

[11]  I. Foster,et al.  The Physiology of the Grid , 2003 .

[12]  Jacob E. Bardram Collaboration, Coordination and Computer Support: An Activity Theoretical Approach to the Design of Computer Supported Cooperative Work. Ph.D. Thesis , 1998 .

[13]  Ian J. Taylor,et al.  Web services composition for distributed data mining , 2005, 2005 International Conference on Parallel Processing Workshops (ICPPW'05).

[14]  Yogesh L. Simmhan,et al.  Provenance Information Model of Karma Version 3 , 2009, 2009 Congress on Services - I.

[15]  Ian J. Taylor,et al.  The Triana Workflow Environment: Architecture and Applications , 2007, Workflows for e-Science, Scientific Workflows for Grids.

[16]  Anne E. Trefethen,et al.  Cyberinfrastructure for e-Science , 2005, Science.

[17]  Domenico Talia,et al.  Distributed data mining services leveraging WSRF , 2007, Future Gener. Comput. Syst..

[18]  G. Alonso,et al.  Parallel computing patterns for Grid workflows , 2006, 2006 Workshop on Workflows in Support of Large-Scale Science.

[19]  Radu Prodan,et al.  DEE: A Distributed Fault Tolerant Workflow Enactment Engine for Grid Computing , 2005, HPCC.