Orange4WS Environment for Service-Oriented Data Mining

Novel data-mining tasks in e-science involve mining of distributed, highly heterogeneous data and knowledge sources. However, standard data mining platforms, such as Weka and Orange, involve only their own data mining algorithms in the process of knowledge discovery from local data sources. In contrast, next generation data mining technologies should enable processing of distributed data sources, the use of data mining algorithms implemented as web services, as well as the use of formal descriptions of data sources and knowledge discovery tools in the form of ontologies, enabling automated composition of complex knowledge discovery workflows for a given data mining task. This paper proposes a novel Service-oriented Knowledge Discovery framework and its implementation in a service-oriented data mining environment Orange4WS (Orange for Web Services), based on the existing Orange data mining toolbox and its visual programming environment, which enables manual composition of data mining workflows. The new service-oriented data mining environment Orange4WS includes the following new features: simple use of web services as remote components that can be included into a data mining workflow; simple incorporation of relational data mining algorithms; a knowledge discovery ontology to describe workflow components (data, knowledge and data mining services) in an abstract and machine-interpretable way, and its use by a planner that enables automated composition of data mining workflows. These new features are showcased in three real-world scenarios.

[1]  Freddy Lécué,et al.  Applying Abduction in Semantic Web Service Composition , 2007, IEEE International Conference on Web Services (ICWS 2007).

[2]  Katharina Morik,et al.  The MiningMart Approach to Knowledge Discovery in Databases , 2004 .

[3]  Ian J. Taylor,et al.  Web services composition for distributed data mining , 2005, 2005 International Conference on Parallel Processing Workshops (ICPPW'05).

[4]  Saso Dzeroski,et al.  Towards a General Framework for Data Mining , 2006, KDID.

[5]  Branko Kavsek,et al.  APRIORI-SD: ADAPTING ASSOCIATION RULE LEARNING TO SUBGROUP DISCOVERY , 2006, IDA.

[6]  Carole A. Goble,et al.  The design and realisation of the myExperiment Virtual Research Environment for social sharing of workflows , 2009, Future Gener. Comput. Syst..

[7]  Vlado Stankovski,et al.  Grid-enabling data mining applications with DataMiningGrid: An architectural perspective , 2008, Future Gener. Comput. Syst..

[8]  Nada Lavrac,et al.  Propositionalization-based relational subgroup discovery with RSD , 2006, Machine Learning.

[9]  Julian Padget,et al.  Automatic Mapping of OWL Ontologies into Java , 2004, SEKE.

[10]  Zhengding Lu,et al.  Ontology-based universal knowledge grid: enabling knowledge discovery and integration on the grid , 2004, IEEE International Conference onServices Computing, 2004. (SCC 2004). Proceedings. 2004.

[11]  Bernhard Nebel,et al.  The FF Planning System: Fast Plan Generation Through Heuristic Search , 2011, J. Artif. Intell. Res..

[12]  Claudia Diamantini,et al.  Ontology-Driven KDD Process Composition , 2009, IDA.

[13]  Carole A. Goble,et al.  Taverna: a tool for building and running workflows of services , 2006, Nucleic Acids Res..

[14]  Ian J. Taylor,et al.  The Triana Workflow Environment: Architecture and Applications , 2007, Workflows for e-Science, Scientific Workflows for Grids.

[15]  Hannu Toivonen,et al.  Link Discovery in Graphs Derived from Biological Databases , 2006, DILS.

[16]  M. Hilario,et al.  A Data Mining Ontology for Algorithm Selection and Meta-Mining , 2009 .

[17]  Nada Lavrac,et al.  SEGS: Search for enriched gene sets in microarray data , 2008, J. Biomed. Informatics.

[18]  Diego Calvanese,et al.  The Description Logic Handbook: Theory, Implementation, and Applications , 2003, Description Logic Handbook.

[19]  Bijan Parsia,et al.  SPARQL-DL: SPARQL Query for OWL-DL , 2007, OWLED.

[20]  Domenico Talia,et al.  Weka4WS: A WSRF-Enabled Weka Toolkit for Distributed Data Mining on Grids , 2005, PKDD.

[21]  Abraham Bernstein,et al.  Toward intelligent assistance for a data mining process: an ontology-based approach for cost-sensitive classification , 2005, IEEE Transactions on Knowledge and Data Engineering.

[22]  Anton Riabov,et al.  A Planning Approach for Message-Oriented Semantic Web Service Composition , 2007, AAAI.

[23]  Ingo Mierswa,et al.  YALE: rapid prototyping for complex data mining tasks , 2006, KDD '06.

[24]  Nada Lavrac,et al.  Expert-Guided Subgroup Discovery: Methodology and Application , 2011, J. Artif. Intell. Res..

[25]  James A. Hendler,et al.  HTN planning for Web Service composition using SHOP2 , 2004, J. Web Semant..

[26]  Jörg Hoffmann Towards Efficient Belief Update for Planning-Based Web Service Composition , 2008, ECAI.

[27]  Peter A. Flach,et al.  Subgroup Discovery with CN2-SD , 2004, J. Mach. Learn. Res..

[28]  Saso Dzeroski,et al.  OntoDM: An Ontology of Data Mining , 2008, 2008 IEEE International Conference on Data Mining Workshops.

[29]  Matthias Klusch,et al.  Semantic Web Service Composition Planning with OWLS-Xplan , 2005, AAAI Fall Symposium: Agents and the Semantic Web.

[30]  Abraham Bernstein,et al.  The NExT System: Towards True Dynamic Adaptations of Semantic Web Service Compositions , 2007, ESWC.

[31]  Blaz Zupan,et al.  Orange: From Experimental Machine Learning to Interactive Data Mining , 2004, PKDD.

[32]  R. Gentleman,et al.  Gene expression profile of adult T-cell acute lymphocytic leukemia identifies distinct subsets of patients with different response to therapy and survival. , 2004, Blood.

[33]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[34]  Ian Witten,et al.  Data Mining , 2000 .

[35]  Stephen Muggleton,et al.  To the international computing community: A new East-West challenge , 1994 .

[36]  Thorsten Meinl,et al.  KNIME: The Konstanz Information Miner , 2007, GfKl.

[37]  Mohammad Al Hasan,et al.  DMTL : A Generic Data Mining Template Library , 2005 .

[38]  Roy Fielding,et al.  Architectural Styles and the Design of Network-based Software Architectures"; Doctoral dissertation , 2000 .

[39]  Wagner Meira,et al.  Anteater: A Service-Oriented Architecture for High-Performance Data Mining , 2006, IEEE Internet Computing.