RULEM: A novel heuristic rule learning approach for ordinal classification with monotonicity constraints

Abstract In many real world applications classification models are required to be in line with domain knowledge and to respect monotone relations between predictor variables and the target class, in order to be acceptable for implementation. This paper presents a novel heuristic approach, called RULEM, to induce monotone ordinal rule based classification models. The proposed approach can be applied in combination with any rule- or tree-based classification technique, since monotonicity is guaranteed in a post-processing step. RULEM checks whether a rule set or decision tree violates the imposed monotonicity constraints and existing violations are resolved by inducing a set of additional rules which enforce monotone classification. The approach is able to handle non-monotonic noise, and can be applied to both partially and totally monotone problems with an ordinal target variable. Two novel justifiability measures are introduced which are based on RULEM and allow to calculate the extent to which a classification model is in line with domain knowledge expressed in the form of monotonicity constraints. An extensive benchmarking experiment and subsequent statistical analysis of the results on 14 public data sets indicates that RULEM preserves the predictive power of a rule induction technique while guaranteeing monotone classification. On the other hand, the post-processed rule sets are found to be significantly larger which is due to the induction of additional rules. E.g., when combined with Ripper a median performance difference was observed in terms of PCC equal to zero and an average difference equal to −0.66%, with on average 5 rules added to the rule sets. The average and minimum justifiability of the original rule sets equal respectively 92.66% and 34.44% in terms of the RULEMF justifiability index, and 91.28% and 40.1% in terms of RULEMS, indicating the effective need for monotonizing the rule sets.

[1]  Arie Ben-David,et al.  Monotonicity maintenance in information-theoretic machine learning algorithms , 2004, Machine Learning.

[2]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[3]  Leon Sterling,et al.  Adding monotonicity to learning algorithms may impair their accuracy , 2009, Expert Syst. Appl..

[4]  Arie Ben-David,et al.  About the sensitivity of ordinal classifiers to non-monotone noise , 2015, Artif. Intell. Res..

[5]  A. J. Feelders,et al.  Classification trees for problems with monotonicity constraints , 2002, SKDD.

[6]  Viara Popova,et al.  Knowledge Discovery and Monotonicity , 2004 .

[7]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[8]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[9]  Jan Vanthienen,et al.  A tool-supported approach to inter-tabular verification , 1998 .

[10]  Varghese S. Jacob,et al.  Isotonic Separation , 2005, INFORMS J. Comput..

[11]  H. Daniels,et al.  Derivation of monotone decision models from noisy data , 2006, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[12]  Bernard De Baets,et al.  Optimal monotone relabelling of partially non-monotone ordinal data , 2012, Optim. Methods Softw..

[13]  Bernhard Lang,et al.  Monotonic Multi-layer Perceptron Networks as Universal Approximators , 2005, ICANN.

[14]  Saěso Dězeroski Relational Data Mining , 2001, Encyclopedia of Machine Learning and Data Mining.

[15]  Foster J. Provost,et al.  Explaining Data-Driven Document Classifications , 2013, MIS Q..

[16]  Jan Vanthienen,et al.  An Illustration of Verification and Validation in the Modelling Phase of KBS Development , 1998, Data Knowl. Eng..

[17]  Vadim V. Strijov,et al.  Ordinal classification using Pareto fronts , 2015, Expert Syst. Appl..

[18]  Joseph Sill,et al.  Monotonic Networks , 1997, NIPS.

[19]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[20]  I. Askira-Gelman,et al.  Knowledge discovery: comprehensibility of the results , 1998, Proceedings of the Thirty-First Hawaii International Conference on System Sciences.

[21]  Eyke Hüllermeier,et al.  Learning monotone nonlinear models using the Choquet integral , 2011, Machine Learning.

[22]  Jiye Liang,et al.  Fusing Monotonic Decision Trees , 2015, IEEE Transactions on Knowledge and Data Engineering.

[23]  José Ramón Cano,et al.  Hyperrectangles Selection for Monotonic Classification by Using Evolutionary Algorithms , 2016, Int. J. Comput. Intell. Syst..

[24]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[25]  Bart Baesens,et al.  Credit scoring for microfinance: is it worth it? , 2012 .

[26]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[27]  Toshihide Ibaraki,et al.  Data Analysis by Positive Decision Trees , 1999, CODAS.

[28]  Chih-Chuan Chen,et al.  A Regularized Monotonic Fuzzy Support Vector Machine Model for Data Mining With Prior Knowledge , 2015, IEEE Transactions on Fuzzy Systems.

[29]  Wojciech Kotlowski,et al.  Rule learning with monotonicity constraints , 2009, ICML '09.

[30]  A. J. Feelders Prior Knowledge in Economic Applications of Data Mining , 2000, PKDD.

[31]  Leon Sterling,et al.  Learning and classification of monotonic ordinal concepts , 1989, Comput. Intell..

[32]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[33]  Bart Baesens,et al.  New insights into churn prediction in the telecommunication sector: A profit driven data mining approach , 2012, Eur. J. Oper. Res..

[34]  Bart Baesens,et al.  Predicting going concern opinion with data mining , 2008, Decis. Support Syst..

[35]  Arie Ben-David,et al.  Generating noisy monotone ordinal datasets , 2013, Artif. Intell. Res..

[36]  Salvatore Greco,et al.  Rough sets theory for multicriteria decision analysis , 2001, Eur. J. Oper. Res..

[37]  A. J. Feelders,et al.  Pruning for Monotone Classification Trees , 2003, IDA.

[38]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[39]  A. J. Feelders Monotone Relabeling in Ordinal Classification , 2010, 2010 IEEE International Conference on Data Mining.

[40]  Bart Baesens,et al.  Ant-Based Approach to the Knowledge Fusion Problem , 2006, ANTS Workshop.

[41]  Peter A. Flach,et al.  Propositionalization approaches to relational data mining , 2001 .

[42]  Bart Baesens,et al.  Decompositional Rule Extraction from Support Vector Machines by Active Learning , 2009, IEEE Transactions on Knowledge and Data Engineering.

[43]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[44]  Francisco Herrera,et al.  Monotonic Random Forest with an Ensemble Pruning Mechanism based on the Degree of Monotonicity , 2015, New Generation Computing.

[45]  Bernard De Baets,et al.  Supervised ranking in the weka environment , 2010, Inf. Sci..

[46]  Bernard De Baets,et al.  Growing decision trees in an ordinal setting , 2003, Int. J. Intell. Syst..

[47]  Monique Snoeck,et al.  Classification With Ant Colony Optimization , 2007, IEEE Transactions on Evolutionary Computation.

[48]  H. Daniels,et al.  Application of MLP Networks to Bond Rating and House Pricing , 1999, Neural Computing & Applications.

[49]  Véronique Van Vlasselaer,et al.  Fraud Analytics : Using Descriptive, Predictive, and Social Network Techniques:A Guide to Data Science for Fraud Detection , 2015 .

[50]  Bart Baesens,et al.  Forecasting and analyzing insurance companies' ratings , 2007 .

[51]  Bart Baesens,et al.  Performance of classification models from a user perspective , 2011, Decis. Support Syst..

[52]  Wojciech Kotlowski,et al.  Stochastic dominance-based rough set model for ordinal classification , 2008, Inf. Sci..

[53]  Richard Weber,et al.  Semi-supervised Constrained Clustering with Cluster Outlier Filtering , 2011, CIARP.

[54]  Bart Baesens,et al.  Building comprehensible customer churn prediction models with advanced rule induction techniques , 2011, Expert Syst. Appl..

[55]  Tom Fawcett,et al.  Data science for business , 2013 .

[56]  Christos Faloutsos,et al.  Fast and Effective Retrieval of Medical Tumor Shapes , 1998, IEEE Trans. Knowl. Data Eng..

[57]  Bernard De Baets,et al.  Loss optimal monotone relabeling of noisy multi-criteria data sets , 2009, Inf. Sci..

[58]  Ashutosh Kumar Singh,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2010 .

[59]  Marina Velikova,et al.  Monotone and Partially Monotone Neural Networks , 2010, IEEE Transactions on Neural Networks.

[60]  M. Friedman A Comparison of Alternative Tests of Significance for the Problem of $m$ Rankings , 1940 .

[61]  Bart Baesens,et al.  Data Mining Techniques for Software Effort Estimation: A Comparative Study , 2012, IEEE Transactions on Software Engineering.

[62]  Wojciech Kotlowski,et al.  Learning Rule Ensembles for Ordinal Classification with Monotonicity Constraints , 2009, Fundam. Informaticae.

[63]  Thomas G. Dietterich,et al.  Learning from Sparse Data by Exploiting Monotonicity Constraints , 2005, UAI.

[64]  A Novel Credit Rating Migration Modeling Approach Using Macroeconomic Indicators , 2013 .

[65]  A. J. Feelders,et al.  Isotonic Classification Trees , 2009, IDA.