Statistical Approach to Ordinal Classification with Monotonicity Constraints

In the ordinal classification with monotonicity constraints, it is assumed that the class label of an object does not decrease when evaluations of this object on considered attributes increase. In this paper, we formulate the problem of ordinal classification with monotonicity constraints from statistical point of view, by imposing constraints both on the probability distribution and on the loss function. We propose a procedure for “monotonizing” the data by relabeling objects, based on minimization of the empirical risk in the class of all monotone functions. The procedure is then used as a preprocessing tool, improving the accuracy of the classifiers. We verify these claims in a computational experiment.

[1]  Michael Doumpos,et al.  Developing and Testing Models for Replicating Credit Ratings: A Multicriteria Approach , 2005 .

[2]  Yoram Singer,et al.  A simple, fast, and effective rule learner , 1999, AAAI 1999.

[3]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[4]  Ramayya Krishnan,et al.  Internet content filtering using isotonic separation on content category ratings , 2007, TOIT.

[5]  S. Greco,et al.  Decision Rule Approach , 2005 .

[6]  Salvatore Greco,et al.  Axiomatic characterization of a general utility function and its particular cases in terms of conjoint measurement and rough-set decision rules , 2004, Eur. J. Oper. Res..

[7]  Constantin Zopounidis,et al.  Application of the Rough Set Approach to Evaluation of Bankruptcy Risk , 1995 .

[8]  Salvatore Greco,et al.  Variable Consistency Monotonic Decision Trees , 2002, Rough Sets and Current Trends in Computing.

[9]  Robert Susmaga,et al.  Dominance-based Rough Set Classifier without Induction of Decision Rules , 2003, RSKD.

[10]  Jerzy W. Grzymala-Busse,et al.  LERS-A System for Learning from Examples Based on Rough Sets , 1992, Intelligent Decision Support.

[11]  David H. Wolpert,et al.  The Lack of A Priori Distinctions Between Learning Algorithms , 1996, Neural Computation.

[12]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[13]  Salvatore Greco,et al.  Monotonic Variable Consistency Rough Set Approaches , 2009, Int. J. Approx. Reason..

[14]  Arie Ben-David,et al.  Automatic Generation of Symbolic Multiattribute Ordinal Knowledge‐Based DSSs: Methodology and Applications* , 1992 .

[15]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[16]  Bernard De Baets,et al.  Growing decision trees in an ordinal setting , 2003, Int. J. Intell. Syst..

[17]  Yoram Singer,et al.  Logistic Regression, AdaBoost and Bregman Distances , 2000, Machine Learning.

[18]  H. Daniels,et al.  Application of MLP Networks to Bond Rating and House Pricing , 1999, Neural Computing & Applications.

[19]  Roman Słowiński,et al.  Intelligent Decision Support , 1992, Theory and Decision Library.

[20]  Colin McDiarmid,et al.  Surveys in Combinatorics, 1989: On the method of bounded differences , 1989 .

[21]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[22]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[23]  S. V. N. Vishwanathan,et al.  Entropy Regularized LPBoost , 2008, ALT.

[24]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[25]  Corinna Cortes,et al.  Boosting Decision Trees , 1995, NIPS.

[26]  Hennie Daniels,et al.  Combining Domain Knowledge and Data in Datamining Systems , 2000 .

[27]  David G. Stork,et al.  Pattern Classification , 1973 .

[28]  Z. Pawlak,et al.  Rough set approach to multi-attribute decision analysis , 1994 .

[29]  Wojciech Ziarko,et al.  Probabilistic Rough Sets , 2005, RSFDGrC.

[30]  Endre Boros,et al.  Boolean regression , 1995, Ann. Oper. Res..

[31]  Ryszard S. Michalski,et al.  A Theory and Methodology of Inductive Learning , 1983, Artificial Intelligence.

[32]  R. Schapire,et al.  Analysis of boosting algorithms using the smooth margin function , 2007, 0803.4092.

[33]  Ivo Düntsch,et al.  Rough set data analysis: A road to non-invasive knowledge discovery , 2000 .

[34]  J. Ross Quinlan,et al.  Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[35]  Varghese S. Jacob,et al.  Isotonic Separation , 2005, INFORMS J. Comput..

[36]  Richard Bellman,et al.  Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.

[37]  Wojciech Kotlowski,et al.  Quality of Rough Approximation in Multi-criteria Classification Problems , 2006, RSCTC.

[38]  Wojciech Ziarko Set Approximation Quality Measures in the Variable Precision Rough Set Model , 2002, HIS.

[39]  S French,et al.  Multicriteria Methodology for Decision Aiding , 1996 .

[40]  Sholom M. Weiss,et al.  Lightweight Rule Induction , 2000, ICML.

[41]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[42]  John Shawe-Taylor,et al.  Optimizing Classifers for Imbalanced Training Sets , 1998, NIPS.

[43]  Zdzislaw Pawlak,et al.  Rough sets and intelligent data analysis , 2002, Inf. Sci..

[44]  Roman Słowiński,et al.  Extension Of The Rough Set Approach To Multicriteria Decision Support , 2000 .

[45]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[46]  Peter Clark,et al.  The CN2 Induction Algorithm , 1989, Machine Learning.

[47]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[48]  Andrzej Skowron,et al.  Rough sets: Some extensions , 2007, Inf. Sci..

[49]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[50]  Wojciech Kotlowski,et al.  Ensemble of Decision Rules for Ordinal Classification with Monotonicity Constraints , 2008, RSKT.

[51]  J. Anderson Regression and Ordered Categorical Variables , 1984 .

[52]  Manfred K. Warmuth,et al.  Boosting as entropy projection , 1999, COLT '99.

[53]  C. K. Chow,et al.  On optimum recognition error and reject tradeoff , 1970, IEEE Trans. Inf. Theory.

[54]  William L. Maxwell,et al.  Establishing Consistent and Realistic Reorder Intervals in Production-Distribution Systems , 1985, Oper. Res..

[55]  Salvatore Greco,et al.  Ordinal regression revisited: Multiple criteria ranking using a set of additive value functions , 2008, Eur. J. Oper. Res..

[56]  David G. Stork,et al.  Pattern classification, 2nd Edition , 2000 .

[57]  Wojciech Kotlowski,et al.  Solving Regression by Learning an Ensemble of Decision Rules , 2006, ICAISC.

[58]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[59]  Peter L. Bartlett,et al.  Functional Gradient Techniques for Combining Hypotheses , 2000 .

[60]  Matthias Ehrgott,et al.  Multiple criteria decision analysis: state of the art surveys , 2005 .

[61]  S. Greco,et al.  Axiomatization of utility, outranking and decision-rule preference models for multiple-criteria classification problems under partial inconsistency with the dominance principle , 2002 .

[62]  Kenneth Steiglitz,et al.  Combinatorial Optimization: Algorithms and Complexity , 1981 .

[63]  Salvatore Greco,et al.  Variable Consistency Model of Dominance-Based Rough Sets Approach , 2000, Rough Sets and Current Trends in Computing.

[64]  Joseph Sill,et al.  Monotonicity Hints , 1996, NIPS.

[65]  Eric Jacquet-Lagrèze,et al.  An Application of the UTA Discriminant Model for the Evaluation of R & D Projects , 1995 .

[66]  Ling Li,et al.  Large-Margin Thresholded Ensembles for Ordinal Regression: Theory and Practice , 2006, ALT.

[67]  Amnon Shashua,et al.  Ranking with Large Margin Principle: Two Approaches , 2002, NIPS.

[68]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[69]  Jerzy Stefanowski,et al.  Hyperplane Aggregation of Dominance Decision Rules , 2003, Fundam. Informaticae.

[70]  F. T. Wright,et al.  Order restricted statistical inference , 1988 .

[71]  G. Lugosi,et al.  Ranking and empirical minimization of U-statistics , 2006, math/0603123.

[72]  Wojciech Kotlowski,et al.  Stochastic dominance-based rough set model for ordinal classification , 2008, Inf. Sci..

[73]  Wojciech Kotlowski,et al.  Maximum likelihood rule ensembles , 2008, ICML '08.

[74]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[75]  Bogdan E. Popescu,et al.  PREDICTIVE LEARNING VIA RULE ENSEMBLES , 2008, 0811.1679.

[76]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[77]  D. Bunn Stochastic Dominance , 1979 .

[78]  Wojciech Kotlowski,et al.  Ordinal Classification with Decision Rules , 2007, MCD.

[79]  Viara Popova,et al.  Knowledge Discovery and Monotonicity , 2004 .

[80]  Salvatore Greco,et al.  An Algorithm for Induction of Decision Rules Consistent with the Dominance Principle , 2000, Rough Sets and Current Trends in Computing.

[81]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[82]  William J. Cook,et al.  Combinatorial optimization , 1997 .

[83]  Leon Sterling,et al.  Learning and classification of ordinal concepts , 1988 .

[84]  Klaus Obermayer,et al.  Regression Models for Ordinal Data: A Machine Learning Approach , 1999 .

[85]  Jan C. Bioch,et al.  Decision trees for ordinal classification , 2000, Intell. Data Anal..

[86]  Jure Leskovec,et al.  Linear Programming Boosting for Uneven Datasets , 2003, ICML.

[87]  Shivani Agarwal,et al.  Ranking on graph data , 2006, ICML.

[88]  Roman Slowinski,et al.  Rough Set Learning of Preferential Attitude in Multi-Criteria Decision Making , 1993, ISMIS.

[89]  Robert Susmaga,et al.  Fast rule extraction with binary-coded relations , 2003, Intell. Data Anal..

[90]  H. Zou The Margin Vector , Admissible Loss and Multi-class Margin-based Classifiers , 2005 .

[91]  Ling Li,et al.  Ordinal Regression by Extended Binary Classification , 2006, NIPS.

[92]  Constantin Zopounidis,et al.  A preference disaggregation decision support system for financial classification problems , 2001, Eur. J. Oper. Res..

[93]  Gunnar Rätsch,et al.  Boosting Algorithms for Maximizing the Soft Margin , 2007, NIPS.

[94]  S. Greco,et al.  Dominance-Based Rough Set Approach to Knowledge Discovery (I): General Perspective , 2004 .

[95]  Young U. Ryu,et al.  DATA CLASSIFICATION USING THE ISOTONIC SEPARATION TECHNIQUE : APPLICATION TO BREAST CANCER PREDICTION , 2004 .

[96]  A. J. Feelders,et al.  Classification trees for problems with monotonicity constraints , 2002, SKDD.

[97]  Bernard De Baets,et al.  Modeling annoyance aggregation with choquet integrals. , 2002 .

[98]  H. D. Brunk Maximum Likelihood Estimates of Monotone Parameters , 1955 .

[99]  Peter L. Bartlett,et al.  Improved Generalization Through Explicit Optimization of Margins , 2000, Machine Learning.

[100]  JOHANNES FÜRNKRANZ,et al.  Separate-and-Conquer Rule Learning , 1999, Artificial Intelligence Review.

[101]  Salvatore Greco,et al.  Rough Set Based Decision Support , 2005 .

[102]  Salvatore Greco,et al.  Dominance-Based Rough Set Approach as a Proper Way of Handling Graduality in Rough Set Theory , 2007, Trans. Rough Sets.

[103]  Leo Breiman,et al.  Prediction Games and Arcing Algorithms , 1999, Neural Computation.

[104]  Roman Slowinski,et al.  Incremental Induction of Decision Rules from Dominance-based Rough Approximations , 2003, RSKD.

[105]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[106]  Wojciech Kotlowski,et al.  Additive Preference Model with Piecewise Linear Components Resulting from Dominance-Based Rough Set Approximations , 2006, ICAISC.

[107]  Oleg Burdakov,et al.  An O(n2) algorithm for isotonic regression , 2006 .

[108]  Shouhong Wang,et al.  Neural network techniques for monotonic nonlinear models , 1994, Comput. Oper. Res..

[109]  Cynthia Rudin,et al.  Precise Statements of Convergence for AdaBoost and arc-gv , 2007 .

[110]  Wojciech Kotlowski,et al.  Ensembles of Decision Rules for Solving Binary Classification Problems in the Presence of Missing Values , 2006, RSCTC.

[111]  Salvatore Greco,et al.  Rough approximation of a preference relation by dominance relations , 1999, Eur. J. Oper. Res..

[112]  Jerzy Stefanowski,et al.  Incremental Rule Induction for Multicriteria and Multiattribute Classification , 2003, IIS.

[113]  Salvatore Greco,et al.  Rough Set Analysis of Preference-Ordered Data , 2002, Rough Sets and Current Trends in Computing.

[114]  Yoshua Bengio,et al.  No Unbiased Estimator of the Variance of K-Fold Cross-Validation , 2003, J. Mach. Learn. Res..

[115]  Rob Potharst,et al.  Monotone Decision Trees , 1997 .

[116]  Umesh V. Vazirani,et al.  An Introduction to Computational Learning Theory , 1994 .

[117]  菅野 道夫,et al.  Theory of fuzzy integrals and its applications , 1975 .

[118]  Yiyu Yao,et al.  Decision-Theoretic Rough Set Models , 2007, RSKT.

[119]  Wojciech Kotlowski,et al.  Statistical Model for Rough Set Approach to Multicriteria Classification , 2007, PKDD.

[120]  Roman Słowiński,et al.  Rough Set Analysis of Multi-Attribute Decision Problems , 1994 .

[121]  Salvatore Greco,et al.  Rough Set Approach to Customer Satisfaction Analysis , 2006, RSCTC.

[122]  Marina Velikova,et al.  Monotone models for prediction in data mining , 2006 .

[123]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[124]  Bogdan E. Popescu,et al.  Gradient Directed Regularization , 2004 .

[125]  K. Cao-Van,et al.  Supervised ranking : from semantics to algorithms , 2003 .

[126]  Yoshua Bengio,et al.  Inference for the Generalization Error , 1999, Machine Learning.

[127]  Endre Boros,et al.  Unconstrained multilayer switchbox routing , 1995, Ann. Oper. Res..

[128]  Joseph Sill,et al.  Monotonic Networks , 1997, NIPS.

[129]  Gábor Lugosi,et al.  Pattern Classification and Learning Theory , 2002 .

[130]  Eibe Frank,et al.  A Simple Approach to Ordinal Classification , 2001, ECML.

[131]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[132]  Shouhong Wang,et al.  A neural network technique in modeling multiple criteria multiple person decision making , 1994, Comput. Oper. Res..

[133]  Roman Słowiński,et al.  A New Rough Set Approach to Evaluation of Bankruptcy Risk , 1998 .

[134]  Toshihide Ibaraki,et al.  An Implementation of Logical Analysis of Data , 2000, IEEE Trans. Knowl. Data Eng..

[135]  Yiyu Yao,et al.  Bayesian Decision Theory for Dominance-Based Rough Set Approach , 2007, RSKT.

[136]  Tadeusz Pietraszek,et al.  Optimizing abstaining classifiers using ROC analysis , 2005, ICML.

[137]  Gunnar Rätsch,et al.  v-Arc: Ensemble Learning in the Presence of Outliers , 1999, NIPS.

[138]  V. Koltchinskii,et al.  Complexities of convex combinations and bounding the generalization error in classification , 2004, math/0405356.

[139]  J. Siskos Assessing a set of additive utility functions for multicriteria decision-making , 1982 .

[140]  Tim Robertson,et al.  Consistency in Generalized Isotonic Regression , 1975 .

[141]  J. Bioch,et al.  Monotone Decision Trees and Noisy Data , 2002 .

[142]  Tim Robertson,et al.  Nonparametric, isotonic discriminant procedures , 1999 .

[143]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[144]  Gunnar Rätsch,et al.  Totally corrective boosting algorithms that maximize the margin , 2006, ICML.

[145]  M. Grabisch The application of fuzzy integrals in multicriteria decision making , 1996 .

[146]  Z. Pawlak Rough Sets: Theoretical Aspects of Reasoning about Data , 1991 .

[147]  V. Koltchinskii,et al.  Empirical margin distributions and bounding the generalization error of combined classifiers , 2002, math/0405343.

[148]  G. Choquet Theory of capacities , 1954 .

[149]  Salvatore Greco,et al.  Rough sets theory for multicriteria decision analysis , 2001, Eur. J. Oper. Res..

[150]  A. J. Feelders,et al.  Pruning for Monotone Classification Trees , 2003, IDA.

[151]  Jerzy Stefanowski,et al.  On rough set based approaches to induction of decision rules , 1998 .

[152]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .

[153]  Joseph Sill,et al.  A linear fit gets the correct monotonicity directions , 2007, Machine Learning.

[154]  Gary Koop,et al.  Analysis of Economic Data , 2000 .

[155]  A. Ben-David Monotonicity Maintenance in Information-Theoretic Machine Learning Algorithms , 1995, Machine Learning.

[156]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[157]  J. G. Carbonell,et al.  Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing , 2003, Lecture Notes in Computer Science.

[158]  Roman Słowiński,et al.  A General Framework for Learning an Ensemble of Decision Rules , 2008 .

[159]  Young U. Ryu,et al.  Firm bankruptcy prediction: experimental comparison of isotonic separation and other classification approaches , 2005, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[160]  Patrick Meyer,et al.  Sorting multi-attribute alternatives: The TOMASO method , 2005, Comput. Oper. Res..

[161]  S. Greco,et al.  Dominance-Based Rough Set Approach to Knowledge Discovery (II): Extensions and Applications , 2004 .

[162]  Wojciech Kotlowski,et al.  Optimized Generalized Decision in Dominance-Based Rough Set Approach , 2007, RSKT.

[163]  N. Christopeit,et al.  Strong Consistency of Least-Squares Estimators in the Monotone Regression Model with Stochastic Regressors , 1987 .

[164]  Ayhan Demiriz,et al.  Linear Programming Boosting via Column Generation , 2002, Machine Learning.

[165]  Toshihide Ibaraki,et al.  Data Analysis by Positive Decision Trees , 1999, CODAS.

[166]  H. D. Brunk,et al.  AN EMPIRICAL DISTRIBUTION FUNCTION FOR SAMPLING WITH INCOMPLETE INFORMATION , 1955 .

[167]  Salvatore Greco,et al.  Rough approximation by dominance relations , 2002, Int. J. Intell. Syst..

[168]  Salvatore Greco,et al.  Multi-criteria classification - A new scheme for application of dominance-based decision rules , 2007, Eur. J. Oper. Res..

[169]  Mokhtar S. Bazaraa,et al.  Nonlinear Programming: Theory and Algorithms , 1993 .

[170]  Yiyu Yao,et al.  A Decision Theoretic Framework for Approximating Concepts , 1992, Int. J. Man Mach. Stud..

[171]  Jerzy W. Grzymala-Busse,et al.  Rough Sets , 1995, Commun. ACM.