A Model-Driven Ecosystem for the Definition of Data Mining Domain-Specific Languages

Data mining techniques are making their entrance in nowadays companies, allowing business users to take informed decisions based on their available data. However, these business experts usually lack the knowledge to perform the analysis of the data by themselves, which makes it necessary to rely on experts in the field of data mining. In an attempt to solve this problem, we previously studied the definition of domain-specific languages, which allowed to specify data mining processes without requiring experience in the applied techniques. The specification was made through high-level language primitives, which referred only to familiar concepts and terms from the original domain of the data. Therefore, technical details about the mining processes were hidden to the final user. Although these languages present themselves as a promising solution, their development can become a challenging task, incurring in costly endeavours. This work describes a development ecosystem devised for the generation of these languages, starting from a generic perspective that can be specialized into the details of each domain.

[1]  Thorsten Meinl,et al.  KNIME - the Konstanz information miner: version 2.0 and beyond , 2009, SKDD.

[2]  Bogdan Gabrys,et al.  Metalearning: a survey of trends and technologies , 2013, Artificial Intelligence Review.

[3]  Marcos M. Campos,et al.  Data-centric automated data mining , 2005, Fourth International Conference on Machine Learning and Applications (ICMLA'05).

[4]  Richard S. Johannes,et al.  Using the ADAP Learning Algorithm to Forecast the Onset of Diabetes Mellitus , 1988 .

[5]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[6]  Thorsten Meinl,et al.  KNIME: The Konstanz Information Miner , 2007, GfKl.

[7]  Richard F. Paige,et al.  Integrated Model Management with Epsilon , 2011, ECMFA.

[8]  Marta E. Zorrilla,et al.  Towards a DSL for Educational Data Mining , 2015, SLATE.

[9]  Bernard Kamsu-Foguem,et al.  User-centered visual analysis using a hybrid reasoning architecture for intensive care units , 2012, Decision Support Systems.

[10]  Heiko Behrens,et al.  Xtext: implement your language faster than the quick and dirty way , 2010, SPLASH/OOPSLA Companion.

[11]  Jennifer Widom,et al.  The Beckman Report on Database Research , 2014, SGMD.

[12]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[13]  Marta E. Zorrilla,et al.  A service oriented architecture to provide data mining services for non-expert data miners , 2013, Decis. Support Syst..

[14]  William Rice,et al.  Moodle 1.9 E-Learning Course Development , 2008 .

[15]  Claes Wohlin,et al.  Systematic literature studies: Database searches vs. backward snowballing , 2012, Proceedings of the 2012 ACM-IEEE International Symposium on Empirical Software Engineering and Measurement.