FLANDM: a development framework of domain-specific languages for data mining democratisation

Abstract Companies have an increasing interest in employing data mining to take advantage of the vast amounts of data their systems store nowadays. This interest confronts two problems: (1) business experts usually lack the skills required to apply data mining techniques, and (2) the specialists who know how to use these techniques are a scarce and valuable asset. To help democratise data mining, we proposed, in a previous work, the development of domain-specific languages (DSLs) that hide the complexity of data mining techniques. The objective of these DSLs is to allow business experts to specify analysis processes by using high-level primitives and terminology from the application domain. These specifications would then be automatically transformed into a low-level, executable form. Although these DSLs might offer a promising solution to the aforementioned problems, their development from scratch requires a considerable effort and, consequently, they are costly. In order to make these languages affordable, we present FLANDM, an ecosystem devised for the rapid development of DSLs for data mining democratisation. FLANDM provides a base infrastructure that can be easily customised for the particularities of each domain, enabling controlled and systematic reuse of previously developed artefacts. By using FLANDM, new DSLs for data mining democratisation can be defined achieving a 50% of reduction in their development costs.

[1]  Bernhard Rumpe,et al.  Engineering modeling languages , 2016 .

[2]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[3]  Marta E. Zorrilla,et al.  On the Automated Transformation of Domain Models into Tabular Datasets , 2017, ER Forum/Demos.

[4]  Marta E. Zorrilla,et al.  A service oriented architecture to provide data mining services for non-expert data miners , 2013, Decis. Support Syst..

[5]  William B. Frakes,et al.  Software reuse: metrics and models , 1996, CSUR.

[6]  M Mernik,et al.  When and how to develop domain-specific languages , 2005, CSUR.

[7]  Hans-Peter Kriegel,et al.  Towards an effective cooperation of the user and the computer for classification , 2000, KDD '00.

[8]  William Rice,et al.  Moodle 1.9 E-Learning Course Development , 2008 .

[9]  Thorsten Meinl,et al.  KNIME: The Konstanz Information Miner , 2007, GfKl.

[10]  Chris Woolston,et al.  Information management: Data domination , 2017, Nature.

[11]  Mark Strembeck,et al.  An approach for the systematic development of domain-specific languages , 2009 .

[12]  Daniel M. Germán,et al.  An empirical study of integration activities in distributions of open source software , 2015, Empirical Software Engineering.

[13]  Uday R. Kulkarni,et al.  Strategies for Software Reuse: A Principal Component Analysis of Reuse Practices , 2003, IEEE Trans. Software Eng..

[14]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[15]  David A. Bell,et al.  The role of domain knowledge in data mining , 1995, CIKM '95.

[16]  Evans,et al.  Domain-driven design , 2003 .

[17]  Marcos M. Campos,et al.  Data-centric automated data mining , 2005, Fourth International Conference on Machine Learning and Applications (ICMLA'05).

[18]  Olivier Poch,et al.  KD4v: comprehensible knowledge discovery system for missense variant , 2012, Nucleic Acids Res..

[19]  Nuno Oliveira,et al.  Comparing general-purpose and domain-specific languages: An empirical study , 2010, Comput. Sci. Inf. Syst..

[20]  Adel M. Alimi,et al.  A user-centered approach for the design and implementation of KDD-based DSS: A case study in the healthcare domain , 2010, Decis. Support Syst..

[21]  Iztok Fister,et al.  EasyTime++: A case study of incremental domain-specific language development , 2013, Inf. Technol. Control..

[22]  Richard F. Paige,et al.  The Design of a Conceptual Framework and Technical Infrastructure for Model Management Language Engineering , 2009, 2009 14th IEEE International Conference on Engineering of Complex Computer Systems.

[23]  Anneke Kleppe,et al.  Software Language Engineering: Creating Domain-Specific Languages Using Metamodels , 2008 .

[24]  Richard F. Paige,et al.  The Epsilon Generation Language , 2008, ECMDA-FA.

[25]  J. E. Gaffney,et al.  Software reuse—key to enhanced productivity: some quantitative models , 1989 .

[26]  Thorsten Meinl,et al.  KNIME - the Konstanz information miner: version 2.0 and beyond , 2009, SKDD.

[27]  Suresh P. Sethi,et al.  Optimal software design reuse policies: A control theoretic approach , 2014, Inf. Syst. Frontiers.

[28]  Richard F. Paige,et al.  The Epsilon Transformation Language , 2008, ICMT@TOOLS.

[29]  Jennifer Widom,et al.  The Beckman Report on Database Research , 2014, SGMD.

[30]  Richard S. Johannes,et al.  Using the ADAP Learning Algorithm to Forecast the Onset of Diabetes Mellitus , 1988 .

[31]  Marta E. Zorrilla,et al.  Enabling Non-expert Users to Apply Data Mining for Bridging the Big Data Divide , 2013, SIMPDA.

[32]  Ralf H. Reussner,et al.  Using internal domain-specific languages to inherit tool support and modularity for model transformations , 2019, Software & Systems Modeling.

[33]  José Maria Parente de Oliveira,et al.  A data mining system for providing analytical information on brain tumors to public health decision makers , 2013, Comput. Methods Programs Biomed..

[34]  Heiko Behrens,et al.  Xtext: implement your language faster than the quick and dirty way , 2010, SPLASH/OOPSLA Companion.

[35]  Houari A. Sahraoui,et al.  Systematic mapping study of template-based code generation , 2017, Comput. Lang. Syst. Struct..

[36]  David H. Wolpert,et al.  No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[37]  Marta E. Zorrilla,et al.  Towards a DSL for Educational Data Mining , 2015, SLATE.

[38]  José L. Balcázar Parameter-free Association Rule Mining with Yacaree , 2011, EGC.

[39]  Luca Chittaro,et al.  Data mining on temporal data: a visual approach and its clinical application to hemodialysis , 2003, J. Vis. Lang. Comput..

[40]  Bernard Kamsu-Foguem,et al.  User-centered visual analysis using a hybrid reasoning architecture for intensive care units , 2012, Decision Support Systems.

[41]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery in Databases , 1996, AI Mag..

[42]  Seung-won Hwang,et al.  Enriching Documents with Examples: A Corpus Mining Approach , 2013, TOIS.

[43]  Bogdan Gabrys,et al.  Metalearning: a survey of trends and technologies , 2013, Artificial Intelligence Review.

[44]  Arie van Deursen,et al.  Little languages: little maintenance , 1998 .

[45]  Edvard Tijan,et al.  Cluster analysis of student activity in a web-based intelligent tutoring system , 2015 .

[46]  Daniel G. Bobrow,et al.  Book review: The Art of the MetaObject Protocol By Gregor Kiczales, Jim des Rivieres, Daniel G. and Bobrow(MIT Press, 1991) , 1991, SGAR.

[47]  Andreas Dengel,et al.  Automatic classifier selection for non-experts , 2012, Pattern Analysis and Applications.

[48]  Salah Sadou,et al.  Software architecture constraint reuse-by-composition , 2016, Future Gener. Comput. Syst..

[49]  Frank Budinsky,et al.  Eclipse Modeling Framework , 2003 .

[50]  Marta E. Zorrilla,et al.  A Model-Driven Ecosystem for the Definition of Data Mining Domain-Specific Languages , 2017, MEDI.

[51]  J. M. LUNA,et al.  MDM tool: A data mining framework integrated into Moodle , 2017, Comput. Appl. Eng. Educ..

[52]  Di Wu,et al.  Empirical study of the effects of open source adoption on software development economics , 2007, J. Syst. Softw..

[53]  Barry W. Boehm,et al.  Improving Software Productivity , 1987, Computer.

[54]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[55]  Stefan Decker,et al.  ReVeaLD: A user-driven domain-specific interactive search platform for biomedical research , 2014, J. Biomed. Informatics.

[56]  Andrew W. Moore,et al.  X-means: Extending K-means with Efficient Estimation of the Number of Clusters , 2000, ICML.

[57]  Marjan Mernik,et al.  Ontology driven development of domain-specific languages , 2011, Comput. Sci. Inf. Syst..

[58]  Bart Baesens,et al.  Analytics in a Big Data World: The Essential Guide to Data Science and its Applications , 2014 .

[59]  Claes Wohlin,et al.  Systematic literature studies: Database searches vs. backward snowballing , 2012, Proceedings of the 2012 ACM-IEEE International Symposium on Empirical Software Engineering and Measurement.

[60]  Claes Wohlin,et al.  Experimentation in Software Engineering , 2012, Springer Berlin Heidelberg.