PRESISTANT: Data Pre-processing Assistant

A concrete classification algorithm may perform differently on datasets with different characteristics, e.g., it might perform better on a dataset with continuous attributes rather than with categorical attributes, or the other way around. Typically, in order to improve the results, datasets need to be pre-processed. Taking into account all the possible pre-processing operators, there exists a staggeringly large number of alternatives and non-experienced users become overwhelmed. Trial and error is not feasible in the presence of big amounts of data. We developed a method and tool—PRESISTANT, with the aim of answering the need for user assistance during data pre-processing. Leveraging ideas from meta-learning, PRESISTANT is capable of assisting the user by recommending pre-processing operators that ultimately improve the classification performance. The user selects a classification algorithm, from the ones considered, and then PRESISTANT proposes candidate transformations to improve the result of the analysis. In the demonstration, participants will experience, at first hand, how PRESISTANT easily and effectively ranks the pre-processing operators.

[1]  Aaron Klein,et al.  Efficient and Robust Automated Machine Learning , 2015, NIPS.

[2]  Alberto Abelló,et al.  PRESISTANT: Learning based assistant for data pre-processing , 2018, Data Knowl. Eng..

[3]  Maurizio Lenzerini,et al.  Data integration: a theoretical perspective , 2002, PODS.

[4]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[5]  Ihab F. Ilyas,et al.  Data Cleaning: Overview and Emerging Challenges , 2016, SIGMOD Conference.

[6]  Alberto Abelló,et al.  Automated Data Pre-processing via Meta-learning , 2016, MEDI.

[7]  Ricardo Vilalta,et al.  Metalearning - Applications to Data Mining , 2008, Cognitive Technologies.

[8]  M. Arthur Munson,et al.  A study on the importance of and time spent on different modeling steps , 2012, SKDD.

[9]  Alexandros Kalousis,et al.  Algorithm selection via meta-learning , 2002 .

[10]  Jeffrey Heer,et al.  Wrangler: interactive visual specification of data transformation scripts , 2011, CHI.

[11]  Alberto Abelló,et al.  On the predictive power of meta-features in OpenML , 2017, Int. J. Appl. Math. Comput. Sci..

[12]  Melanie Hilario,et al.  Using Meta-mining to Support Data Mining Workflow Planning and Optimization , 2014, J. Artif. Intell. Res..

[13]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery in Databases , 1996, AI Mag..

[14]  Jaana Kekäläinen,et al.  IR evaluation methods for retrieving highly relevant documents , 2000, SIGIR Forum.

[15]  Tim Furche,et al.  Data Wrangling for Big Data: Challenges and Opportunities , 2016, EDBT.

[16]  Alberto Abelló,et al.  Towards Intelligent Data Analysis: The Metadata Challenge , 2016, IoTBD.

[17]  Alberto Abelló,et al.  Intelligent assistance for data pre-processing , 2018, Comput. Stand. Interfaces.