A Data-Driven Knowledge Acquisition System: An End-to-End Knowledge Engineering Process for Generating Production Rules

Data-driven knowledge acquisition is one of the key research fields in data mining. Dealing with large amounts of data has received a lot of attention in the field recently, and a number of methodologies have been proposed to extract insights from data in an automated or semi-automated manner. However, these methodologies generally target a specific aspect of the data mining process, such as data acquisition, data preprocessing, or data classification. However, a comprehensive knowledge acquisition method is crucial to support the end-to-end knowledge engineering process. In this paper, we introduce a knowledge acquisition system that covers all major phases of the cross-industry standard process for data mining. Acknowledging the importance of an end-to-end knowledge engineering process, we designed and developed an easy-to-use data-driven knowledge acquisition tool (DDKAT). The major features of the DDKAT are: (1) a novel unified features scoring approach for data selection; (2) a user-friendly data processing interface to improve the quality of the raw data; (3) an appropriate decision tree algorithm selection approach to build a classification model; and (4) the generation of production rules from various decision tree classification models in an automated manner. Furthermore, two diabetes studies were performed to assess the value of the DDKAT in terms of user experience. A total of 19 experts were involved in the first study and 102 students in the artificial intelligence domain were involved in the second study. The results showed that the overall user experience of the DDKAT was positive in terms of its attractiveness, as well as its pragmatic and hedonic quality factors.

[1]  Marta E. Zorrilla,et al.  A service oriented architecture to provide data mining services for non-expert data miners , 2013, Decis. Support Syst..

[2]  Wilker Altidor,et al.  Ensemble Feature Ranking Methods for Data Intensive Computing Applications , 2011 .

[3]  Tom Fawcett,et al.  Data Science and its Relationship to Big Data and Data-Driven Decision Making , 2013, Big Data.

[4]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[5]  Harry Budi Santoso,et al.  Measuring User Experience of the Student-Centered e-Learning Environment , 2016 .

[6]  Sonal Jain,et al.  Analysis of k-means clustering approach on the breast cancer Wisconsin dataset , 2016, International Journal of Computer Assisted Radiology and Surgery.

[7]  Javier Segovia,et al.  A Data Mining & Knowledge Discovery Process Model , 2009 .

[8]  Carmelo Ardito,et al.  End-user composition of interactive applications through actionable UI components , 2017, J. Vis. Lang. Comput..

[9]  Isnaeni Nurrohmah,et al.  The usability and user experience evaluation of web-based online self-monitoring tool: Case study human-computer interaction course , 2016, 2016 4th International Conference on User Science and Engineering (i-USEr).

[10]  Wilker Altidor Stability analysis of feature selection approaches with low quality data , 2011 .

[11]  Byeong Ho Kang,et al.  SaKEM: A Semi-automatic Knowledge engineering methodology for building rule-based knowledgebase , 2016 .

[12]  Ross M. Mullner,et al.  Clinical Epidemiology: The Essentials, 4th Edition , 2006 .

[13]  Laura Cecchi,et al.  Micromanagement basado en formaciones de grupo implementado con scripting dinámico , 2014 .

[14]  John Herbert,et al.  Baran: An Interaction-centred User Monitoring Framework , 2015, PhyCS.

[15]  Risto Miikkulainen,et al.  Automatic feature selection in neuroevolution , 2005, GECCO '05.

[16]  Giovanni Seni,et al.  Ensemble Methods in Data Mining: Improving Accuracy Through Combining Predictions , 2010, Ensemble Methods in Data Mining.

[17]  Cesare Pautasso,et al.  RESTful web services: principles, patterns, emerging technologies , 2010, WWW '10.

[18]  J. Ross Quinlan,et al.  Generating Production Rules from Decision Trees , 1987, IJCAI.

[19]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[20]  Melanie Hilario,et al.  Knowledge and Information Systems , 2007 .

[21]  Sungyoung Lee,et al.  An Innovative Platform for Person-Centric Health and Wellness Support , 2015, IWBBIO.

[22]  Anne M. P. Canuto,et al.  A Comparative Analysis of Feature Selection Methods for Ensembles with Different Combination Methods , 2007, 2007 International Joint Conference on Neural Networks.

[23]  Qiang Shen,et al.  Computational Intelligence and Feature Selection - Rough and Fuzzy Approaches , 2008, IEEE Press series on computational intelligence.

[24]  Li Guo,et al.  Survey and Taxonomy of Feature Selection Algorithms in Intrusion Detection System , 2006, Inscrypt.

[25]  Sungyoung Lee,et al.  Accurate multi-criteria decision making methodology for recommending machine learning algorithm , 2017, Expert Syst. Appl..

[26]  Vasant Dhar,et al.  Data science and prediction , 2012, CACM.

[27]  Martin Schrepp,et al.  Construction and Evaluation of a User Experience Questionnaire , 2008, USAB.

[28]  LarrañagaPedro,et al.  A review of feature selection techniques in bioinformatics , 2007 .

[29]  Francisco Herrera,et al.  A Survey of Discretization Techniques: Taxonomy and Empirical Analysis in Supervised Learning , 2013, IEEE Transactions on Knowledge and Data Engineering.

[30]  Martin Schrepp,et al.  Construction of a Benchmark for the User Experience Questionnaire (UEQ) , 2017, Int. J. Interact. Multim. Artif. Intell..

[31]  Vaninha Vieira,et al.  Towards a Lightweight Approach for On-site Interaction Evaluation of Safety-critical Mobile Systems , 2016, FNC/MobiSPC.

[32]  Byeong Ho Kang,et al.  Mining minds: journey of evolutionary platform for ubiquitous wellness , 2015 .

[33]  Lipika Dey,et al.  A feature selection technique for classificatory analysis , 2005, Pattern Recognit. Lett..

[34]  Gopal K Gupta,et al.  Introduction to Data Mining with Case Studies , 2011 .

[35]  Nur Izura Udzir,et al.  A Study on Feature Selection and Classification Techniques for Automatic Genre Classification of Traditional Malay Music , 2008, ISMIR.

[36]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[37]  Sungyoung Lee,et al.  A Case-Based Meta-Learning and Reasoning Framework for Classifiers Selection , 2018, IMCOM.

[38]  Lior Rokach,et al.  Feature Selection by Combining Multiple Methods , 2006, Advances in Web Intelligence and Data Mining.

[39]  Shubhamoy Dey,et al.  Performance Investigation of Feature Selection Methods and Sentiment Lexicons for Sentiment Analysis , 2012 .

[40]  L. A. Belanche,et al.  Review and Evaluation of Feature Selection Algorithms in Synthetic Problems , 2011, 1101.2320.

[41]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[42]  Patrick Henry Winston,et al.  Artificial intelligence (2nd ed.) , 1984 .

[43]  S. I. Ali,et al.  A feature subset selection method based on symmetric uncertainty and Ant Colony Optimization , 2012, 2012 International Conference on Emerging Technologies.

[44]  Elena Marchiori,et al.  Ensemble Feature Ranking , 2004, PKDD.

[45]  Sungyoung Lee,et al.  Rough set-based approaches for discretization: a compact review , 2015, Artificial Intelligence Review.

[46]  Longbing Cao Data science , 2017, Commun. ACM.

[47]  Manuel Pérez Cota,et al.  Efficient Measurement of the User Experience of Interactive Products. How to use the User Experience Questionnaire (UEQ).Example: Spanish Language Version , 2013, Int. J. Interact. Multim. Artif. Intell..

[48]  Mohammed Attik Using Ensemble Feature Selection Approach in Selecting Subset with Relevant Features , 2006, ISNN.

[49]  Geoffrey J. McLachlan,et al.  Analyzing Microarray Gene Expression Data , 2004 .

[50]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[51]  Ronaldo C. Prati,et al.  Combining feature ranking algorithms through rank aggregation , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[52]  Paulo Cortez,et al.  Using Data Mining for Prediction of Hospital Length of Stay: An Application of the CRISP-DM Methodology , 2014, ICEIS.

[53]  Huan Liu,et al.  Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[54]  Divya Tomar,et al.  A survey on Data Mining approaches for Healthcare , 2013, BSBT 2013.

[55]  Bernard Zenko,et al.  Evaluation Method for Feature Rankings and their Aggregations for Biomarker Discovery , 2009, MLSB.

[56]  William H. Barker,et al.  CLINICAL EPIDEMIOLOGY: THE ESSENTIALS. , 1984 .

[57]  Sungyoung Lee,et al.  KEM-DT: A Knowledge Engineering Methodology to Produce an Integrated Rules Set using Decision Tree Classifiers , 2018, IMCOM.

[58]  Ruxandra Stoean,et al.  A Survey on Feature Ranking by Means of Evolutionary Computation , 2013 .

[59]  Ian H. Witten,et al.  Knowledge Visualization Techniques for Machine Learning , 1998, Intell. Data Anal..

[60]  Christopher M. Schlick,et al.  User Centered Evaluation of Interactive Data Visualization forms for Document Management Systems , 2015 .

[61]  Yvan Saeys,et al.  Robust Feature Selection Using Ensemble Feature Selection Techniques , 2008, ECML/PKDD.