BIGOWL: Knowledge centered Big Data analytics

Abstract Knowledge extraction and incorporation is currently considered to be beneficial for efficient Big Data analytics. Knowledge can take part in workflow design, constraint definition, parameter selection and configuration, human interactive and decision-making strategies. This paper proposes BIGOWL, an ontology to support knowledge management in Big Data analytics. BIGOWL is designed to cover a wide vocabulary of terms concerning Big Data analytics workflows, including their components and how they are connected, from data sources to the analytics visualization. It also takes into consideration aspects such as parameters, restrictions and formats. This ontology defines not only the taxonomic relationships between the different concepts, but also instances representing specific individuals to guide the users in the design of Big Data analytics workflows. For testing purposes, two case studies are developed, which consists in: first, real-world streaming processing with Spark of traffic Open Data, for route optimization in urban environment of New York city; and second, data mining classification of an academic dataset on local/cloud platforms. The analytics workflows resulting from the BIGOWL semantic model are validated and successfully evaluated.

[1]  Agnieszka Konys Ontology-Based Approaches to Big Data Analytics , 2016, ACS.

[2]  Heiko Paulheim,et al.  Semantic Web in data mining and knowledge discovery: A comprehensive survey , 2016, J. Web Semant..

[3]  Krys J. Kochut,et al.  Ontology-Based Text Classification into Dynamically Defined Topics , 2014, 2014 IEEE International Conference on Semantic Computing.

[4]  Brian McBride,et al.  The Resource Description Framework (RDF) and its Vocabulary Description Language RDFS , 2004, Handbook on Ontologies.

[5]  Heike Trautmann,et al.  Building and Using an Ontology of Preference-Based Multiobjective Evolutionary Algorithms , 2017, EMO.

[6]  Jia Zhang,et al.  Ontology-Based Workflow Generation for Intelligent Big Data Analytics , 2015, 2015 IEEE International Conference on Web Services.

[7]  C. Maria Keet,et al.  The Data Mining OPtimization Ontology , 2015, J. Web Semant..

[8]  José Francisco Aldana Montes,et al.  Enhancing semantic consistency in anti-fraud rule-based expert systems , 2017, Expert Syst. Appl..

[9]  E. Prud hommeaux,et al.  SPARQL query language for RDF , 2011 .

[10]  Thomas R. Gruber,et al.  A translation approach to portable ontology specifications , 1993 .

[11]  Nicola Guarino,et al.  WonderWeb Deliverable D18 Ontology Library , 2003 .

[12]  Benjamin N. Grosof,et al.  SweetDeal: Representing Agent Contracts with Exceptions Using Semantic Web Rules, Ontologies, and Process Descriptions , 2004, Int. J. Electron. Commer..

[13]  Yarden Katz,et al.  Pellet: A practical OWL-DL reasoner , 2007, J. Web Semant..

[14]  Erik W. Kuiler From Big Data to Knowledge: An Ontological Approach to Big Data Analytics , 2014 .

[15]  Sherif Sakr,et al.  Handbook of Big Data Technologies , 2017 .

[16]  Michael K. Ng,et al.  Knowledge-based vector space model for text clustering , 2010, Knowledge and Information Systems.

[17]  Giovanni Iacca,et al.  Presenting the ECO: Evolutionary Computation Ontology , 2017, EvoApplications.

[18]  Melanie Hilario,et al.  Using Meta-mining to Support Data Mining Workflow Planning and Optimization , 2014, J. Artif. Intell. Res..

[19]  Eugenio Di Sciascio,et al.  A semantic-based approach for Machine Learning data analysis , 2015, Proceedings of the 2015 IEEE 9th International Conference on Semantic Computing (IEEE ICSC 2015).

[20]  Nada Lavrac,et al.  Automating Knowledge Discovery Workflow Composition Through Ontology-Based Planning , 2011, IEEE Transactions on Automation Science and Engineering.

[21]  Ian Horrocks,et al.  OWL rules: A proposal and prototype implementation , 2005, J. Web Semant..

[22]  Thomas R. Gruber,et al.  Toward principles for the design of ontologies used for knowledge sharing? , 1995, Int. J. Hum. Comput. Stud..

[23]  Fabrice Guillet,et al.  Knowledge-Based Interactive Postmining of Association Rules Using Ontologies , 2010, IEEE Transactions on Knowledge and Data Engineering.

[24]  Steffen Staab,et al.  International Handbooks on Information Systems , 2013 .

[25]  Abraham Bernstein,et al.  Data mining workflow templates for intelligent discovery assistance and auto-experimentation , 2010 .

[26]  Hao Wang,et al.  Ontology-based deep learning for human behavior prediction in health social networks , 2015, BCB.

[27]  Hao Wang,et al.  Semantic data mining: A survey of ontology-based approaches , 2015, Proceedings of the 2015 IEEE 9th International Conference on Semantic Computing (IEEE ICSC 2015).

[28]  José Francisco Aldana Montes,et al.  jMetalSP: A framework for dynamic multi-objective big data optimization , 2017, Appl. Soft Comput..

[29]  Deborah L. McGuinness,et al.  OWL Web ontology language overview , 2004 .