Rule Extraction via Dynamic Discretization with an Application to Air Quality Modelling

Association rule extraction is a very well-known and important problem in machine learning, and especially in the sub-field of explainable machine learning. Association rules are naturally extracted from data sets with Boolean (or at least categorical) attributes. In order for rule extraction algorithms to be applicable to data sets with numerical attributes as well, data must be suitably discretized, and a great amount of work has been devoted to finding good discretization algorithms, taking into account that optimal discretization is a NP-hard problem. Motivated by a specific application, in this paper we provide a novel discretization algorithm defined as an (heuristic) optimization problem and solved by an evolutionary algorithm, and we test its performances against well-known available solutions, proving (experimentally) that we are able to extract more rules in a easier way.

[1]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[2]  Ansaf Salleb-Aouissi,et al.  QuantMiner for mining quantitative association rules , 2013, J. Mach. Learn. Res..

[3]  Jennifer Widom,et al.  Clustering association rules , 1997, Proceedings 13th International Conference on Data Engineering.

[4]  P. Siarry,et al.  Multiobjective Optimization: Principles and Case Studies , 2004 .

[5]  Andrew Hunter,et al.  A multi-objective genetic algorithm approach to feature selection in neural and fuzzy modeling , 2001 .

[6]  J. Kamińska,et al.  A random forest partition model for predicting NO2 concentrations from traffic flow and meteorological conditions. , 2019, The Science of the total environment.

[7]  R. K. Ursem Multi-objective Optimization using Evolutionary Algorithms , 2009 .

[8]  Sinh Hoa Nguyen,et al.  On Finding Optimal Discretizations for Two Attributes , 1998, Rough Sets and Current Trends in Computing.

[9]  Vladimír Bartík Distance-Based Methods for Association Rule Mining , 2009, Encyclopedia of Data Warehousing and Mining.

[10]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[11]  Fernando Jiménez,et al.  Multi-objective evolutionary feature selection for online sales forecasting , 2017, Neurocomputing.

[12]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[13]  Ujjwal Maulik,et al.  A Survey of Multiobjective Evolutionary Algorithms for Data Mining: Part I , 2014, IEEE Transactions on Evolutionary Computation.

[14]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[15]  María N. Moreno,et al.  A method for mining quantitative association rules , 2006 .

[16]  Behrouz Minaei-Bidgoli,et al.  Mining numerical association rules via multi-objective genetic algorithms , 2013, Inf. Sci..

[17]  Ramakrishnan Srikant,et al.  Mining quantitative association rules in large relational tables , 1996, SIGMOD '96.

[18]  Antonio J. Nebro,et al.  jMetal: A Java framework for multi-objective optimization , 2011, Adv. Eng. Softw..

[19]  J. Kamińska,et al.  Probabilistic Forecasting of Nitrogen Dioxide Concentrations at an Urban Road Intersection , 2018, Sustainability.

[20]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[21]  Baoping Yan,et al.  Mining Quantitative Association Rules on Overlapped Intervals , 2005, ADMA.

[22]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[23]  Sridhar Ramaswamy,et al.  Cyclic association rules , 1998, Proceedings 14th International Conference on Data Engineering.