MEX vocabulary: a lightweight interchange format for machine learning experiments

Over the last decades many machine learning experiments have been published, giving benefit to the scientific progress. In order to compare machine-learning experiment results with each other and collaborate positively, they need to be performed thoroughly on the same computing environment, using the same sample datasets and algorithm configurations. Besides this, practical experience shows that scientists and engineers tend to have large output data in their experiments, which is both difficult to analyze and archive properly without provenance metadata. However, the Linked Data community still misses a lightweight specification for interchanging machine-learning metadata over different architectures to achieve a higher level of interoperability. In this paper, we address this gap by presenting a novel vocabulary dubbed MEX. We show that MEX provides a prompt method to describe experiments with a special focus on data provenance and fulfills the requirements for a long-term maintenance.

[1]  Joaquin Vanschoren,et al.  Exposé: An ontology for data mining experiments , 2010 .

[2]  Deborah L. McGuinness,et al.  PROV-O: The PROV Ontology , 2013 .

[3]  Carole A. Goble,et al.  The design and realisation of the myExperiment Virtual Research Environment for social sharing of workflows , 2009, Future Gener. Comput. Syst..

[4]  Hendrik Blockeel,et al.  Experiment Databases , 2007, Inductive Databases and Constraint-Based Data Mining.

[5]  Jens Lehmann,et al.  Inductive Lexical Learning of Class Expressions , 2014, EKAW.

[6]  Paul T. Groth,et al.  Provenance: An Introduction to PROV , 2013, Provenance.

[7]  Julio Cesar Duarte,et al.  Prediction of assets behavior in financial series using machine learning algorithms , 2013 .

[8]  John A. Kunze,et al.  Dublin Core Metadata for Resource Discovery , 1998, RFC.

[9]  C. Maria Keet,et al.  The Data Mining OPtimization Ontology , 2015, J. Web Semant..

[10]  Jens Lehmann,et al.  DL-Learner: Learning Concepts in Description Logics , 2009, J. Mach. Learn. Res..

[11]  James Cheney,et al.  PROV-O: The PROV ontology:W3C recommendation 30 April 2013 , 2013 .

[12]  Axel-Cyrille Ngonga Ngomo,et al.  A comparison of supervised learning classifiers for link discovery , 2014, SEM '14.

[13]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[14]  N. F. Noy,et al.  Ontology Development 101: A Guide to Creating Your First Ontology , 2001 .

[15]  Eero Hyvönen,et al.  Publishing and Using Cultural Heritage Linked Data on the SemanticWeb.In: A Publication in the Morgan & Claypool Publishers series, SYNTHESIS LECTURES ON SEMANTIC WEB: THEORY AND TECHNOLOGY , 2012 .

[16]  Luís Torgo,et al.  OpenML: networked science in machine learning , 2014, SKDD.

[17]  Jens Lehmann,et al.  NIF Combinator: Combining NLP Tool Output , 2012, EKAW.

[18]  Olga Tcheremenskaia,et al.  Environment and Health , 2004, Frontiers Research Topics.

[19]  Paul T. Groth,et al.  Wings: Intelligent Workflow-Based Design of Computational Experiments , 2011, IEEE Intelligent Systems.

[20]  Yolanda Gil,et al.  Provenance trails in the Wings/Pegasus system , 2008, Concurr. Comput. Pract. Exp..

[21]  Saso Dzeroski,et al.  OntoDM-KDD: Ontology for Representing the Knowledge Discovery Process , 2013, Discovery Science.