Layman Analytics System: A Cloud-Enabled System for Data Analytics Workflow Recommendation

In today’s big data era, there is a tremendously huge amount of data available. Layman users lack not only the knowledge and experience in data analytics to make sense of these data but also the computational resources for executing the analytics. In this paper, we propose and develop a layman analytics system (LAS), which provides the layman users with a scalable and ready-to-use analytics tool to automatically generate analytics workflows for classification tasks. The LAS is designed to benefit from existing open-source data analytics tools using generic ontological modeling of analytics operators from these tools as well as adaptive constraint refinement for metadata learning. Moreover, the LAS can be deployed on both public and private clouds to cater to the need of scalable computing and easy maintenance. To demonstrate the performance of the LAS, we conducted experiments with 114 data sets obtained from the University of California Irvine Machine Learning Repository. The workflows generated by the LAS were benchmarked against the OpenML whereby each data set has a range of classification accuracy obtained using classifiers designed and fine-tuned by data experts. The comparisons showed that 87 out of 114 data sets have exceeded the 50th percentile of the benchmark data. Among these 87 data sets, the LAS outperforms the 90th percentile of the benchmarks on 49 data sets.

[1]  Ingo Mierswa,et al.  YALE: rapid prototyping for complex data mining tasks , 2006, KDD '06.

[2]  Ross Ihaka,et al.  Gentleman R: R: A language for data analysis and graphics , 1996 .

[3]  David J. Spiegelhalter,et al.  Machine Learning, Neural and Statistical Classification , 2009 .

[4]  Sylvain Delisle,et al.  Bridging the gap between data mining and decision support: A case-based reasoning and ontology approach , 2008, Intell. Data Anal..

[5]  Serban eProPlan : a tool to model automatic generation of data mining workflows , 2017 .

[6]  Nada Lavrac,et al.  Automating Knowledge Discovery Workflow Composition Through Ontology-Based Planning , 2011, IEEE Transactions on Automation Science and Engineering.

[7]  Bu-Sung Lee,et al.  Collaborative Analytics with Genetic Programming for Workflow Recommendation , 2013, 2013 IEEE International Conference on Systems, Man, and Cybernetics.

[8]  Hilan Bensusan,et al.  Estimating the Predictive Accuracy of a Classifier , 2001, ECML.

[9]  Bu-Sung Lee,et al.  Collaborative analytics for predicting expressway-traffic congestion , 2012, ICEC '12.

[10]  Abraham Bernstein,et al.  Toward intelligent assistance for a data mining process: an ontology-based approach for cost-sensitive classification , 2005, IEEE Transactions on Knowledge and Data Engineering.

[11]  Alexandros Kalousis,et al.  NOEMON: Design, implementation and performance results of an intelligent assistant for classifier selection , 1999, Intell. Data Anal..

[12]  Abraham Bernstein,et al.  A survey of intelligent assistants for data analysis , 2013, CSUR.

[13]  Abraham Bernstein,et al.  The NExT System: Towards True Dynamic Adaptations of Semantic Web Service Compositions , 2007, ESWC.

[14]  Christophe G. Giraud-Carrier,et al.  The data mining advisor: meta-learning at the service of practitioners , 2005, Fourth International Conference on Machine Learning and Applications (ICMLA'05).

[15]  Saso Dzeroski,et al.  Ranking with Predictive Clustering Trees , 2002, ECML.

[16]  Rudi Studer,et al.  AST: Support for Algorithm Selection with a CBR Approach , 1999, PKDD.

[17]  Patricia B. Cerrito Introduction to Data Mining Using SAS Enterprise Miner , 2006 .

[18]  William A. Gale,et al.  REX review , 1986 .

[19]  Ian J. Taylor,et al.  Workflows and e-Science: An overview of workflow system features and capabilities , 2009, Future Gener. Comput. Syst..

[20]  Derek H. Sleeman,et al.  The Machine Learning Toolbox Consultant , 1993, Int. J. Artif. Intell. Tools.

[21]  Chonho Lee,et al.  A scalable framework for cloud powered workflow execution , 2013, 2013 IEEE Globecom Workshops (GC Wkshps).

[22]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[23]  Parag Kulkarni,et al.  Meta-Learning with Landmarking: A Survey , 2014 .