Towards Intelligent Data Analysis: The Metadata Challenge

Once analyzed correctly, data can yield substantial benefits. The process of analyzing the data and transforming it into knowledge is known as Knowledge Discovery in Databases (KDD). The plethora and subtleties of algorithms in the different steps of KDD, render it challenging. An effective user support is of crucial importance, even more now, when the analysis is performed on Big Data. Metadata is the necessary component to drive the user support. In this paper we study the metadata required to provide user support on every stage of the KDD process. We show that intelligent systems addressing the problem of user assistance in KDD are incomplete in this regard. They do not use the whole potential of metadata to enable assistance during the whole process. We present a comprehensive classification of all the metadata required to provide user support. Furthermore, we present our implementation of a metadata repository for storing and managing this metadata and explain its benefits in a real Big Data analytics project.

[1]  Melanie Hilario,et al.  Model selection via meta-learning: a comparative study , 2000, Proceedings 12th IEEE Internationals Conference on Tools with Artificial Intelligence. ICTAI 2000.

[2]  Abraham Bernstein,et al.  A survey of intelligent assistants for data analysis , 2013, CSUR.

[3]  Abraham Bernstein,et al.  "Semantics Inside!" But Let's Not Tell the Data Miners: Intelligent Support for Data Mining , 2014, ESWC.

[4]  Neil Foshay,et al.  Does data warehouse end-user metadata add value? , 2007, CACM.

[5]  Christophe G. Giraud-Carrier,et al.  The data mining advisor: meta-learning at the service of practitioners , 2005, Fourth International Conference on Machine Learning and Applications (ICMLA'05).

[6]  Yogesh L. Simmhan,et al.  The Open Provenance Model core specification (v1.1) , 2011, Future Gener. Comput. Syst..

[7]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery in Databases , 1996, AI Mag..

[8]  Yogesh L. Simmhan,et al.  A survey of data provenance in e-science , 2005, SGMD.

[9]  Jan Raes,et al.  Inside two commercially available statistical expert systems , 1992 .

[10]  Derek H. Sleeman,et al.  Consultant-2: pre- and post-processing of Machine Learning applications , 1995, Int. J. Hum. Comput. Stud..

[11]  Abraham Bernstein,et al.  Toward intelligent assistance for a data mining process: an ontology-based approach for cost-sensitive classification , 2005, IEEE Transactions on Knowledge and Data Engineering.

[12]  Nikolaos M. Avouris,et al.  The Role of Domain Knowledge in a Large Scale Data Mining Project , 2002, SETN.

[13]  Tin Kam Ho,et al.  Complexity Measures of Supervised Classification Problems , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Torben Bach Pedersen,et al.  Towards Next Generation BI Systems: The Analytical Metadata Challenge , 2014, DaWaK.

[15]  Nada Lavrac,et al.  Automating Knowledge Discovery Workflow Composition Through Ontology-Based Planning , 2011, IEEE Transactions on Automation Science and Engineering.

[16]  Robert Engels,et al.  Planning Tasks for Knowledge Discovery in Databases; Performing Task-Oriented User-Guidance , 1996, KDD.

[17]  Melanie Hilario,et al.  Using Meta-mining to Support Data Mining Workflow Planning and Optimization , 2014, J. Artif. Intell. Res..

[18]  Claudia Diamantini,et al.  Ontology-Driven KDD Process Composition , 2009, IDA.

[19]  Rudi Studer,et al.  AST: Support for Algorithm Selection with a CBR Approach , 1999, PKDD.

[20]  Katharina Morik,et al.  The MiningMart Approach , 2002, GI Jahrestagung.