Multi-pattern generation framework for logical analysis of data

Logical analysis of data (LAD) is a rule-based data mining algorithm using combinatorial optimization and boolean logic for binary classification. The goal is to construct a classification model consisting of logical patterns (rules) that capture structured information from observations. Among the four steps of LAD framework (binarization, feature selection, pattern generation, and model construction), pattern generation has been considered the most important step. Combinatorial enumeration approaches to generate all possible patterns were mostly studied in the literature; however, those approaches suffered from the computational complexity of pattern generation that grows exponentially with data (feature) size. To overcome the problem, recent studies proposed column generation-based approaches to improve the efficacy of building a LAD model with a maximum-margin objective. There was still a difficulty in solving subproblems efficiently to generate patterns. In this study, a new column generation framework is proposed, in which a new mixed-integer linear programming approach is developed to generate multiple patterns having maximum coverage in subproblems at each iteration. In addition to the maximum-margin objective, we propose an alternative objective (minimum-pattern) to solve the LAD problem as a minimum set covering problem. The proposed approaches are evaluated on the datasets from the University of California Irvine Machine Learning Repository. The computational experiments provide comparable performances compared with previous LAD and other well-known classification algorithms.

[1]  Pierre Hansen,et al.  A new column generation algorithm for Logical Analysis of Data , 2011, Ann. Oper. Res..

[2]  Peter L. Hammer,et al.  Comprehensive vs. comprehensible classifiers in logical analysis of data , 2008, Discret. Appl. Math..

[3]  Peter L. Hammer,et al.  Logical Analysis of Data (LAD) model for the early diagnosis of acute ischemic stroke , 2008, BMC Medical Informatics Decis. Mak..

[4]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[5]  Ayhan Demiriz,et al.  Linear Programming Boosting via Column Generation , 2002, Machine Learning.

[6]  Alexander Kogan,et al.  Logical analysis of data – the vision of Peter L. Hammer , 2007, Annals of Mathematics and Artificial Intelligence.

[7]  Alice C Lee,et al.  Handbook of Quantitative Finance and Risk Management , 2010 .

[8]  Alexander Kogan,et al.  Combinatorial Methods for Constructing Credit Risk Ratings , 2010 .

[9]  Alexander Kogan,et al.  Reverse-Engineering Country Risk Ratings: Combinatorial Non-Recursive Model , 2007 .

[10]  Jorge J. Moré,et al.  Digital Object Identifier (DOI) 10.1007/s101070100263 , 2001 .

[11]  Y. Crama,et al.  Cause-effect relationships and partially defined Boolean functions , 1988 .

[12]  Kimberly K. Leslie,et al.  Ovarian Cancer Detection , 2011 .

[13]  Peter L. Hammer,et al.  Spanned patterns for the logical analysis of data , 2006, Discret. Appl. Math..

[14]  Peter L. Hammer,et al.  Pattern-Based Discriminants in the Logical Analysis of Data , 2007 .

[15]  G Alexe,et al.  Logical analysis of diffuse large B-cell lymphomas , 2005, Artif. Intell. Medicine.

[16]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[17]  Peter L. Hammer,et al.  Use of the Logical Analysis of Data Method for Assessing Long-Term Mortality Risk After Exercise Electrocardiography , 2002, Circulation.

[18]  Ying Liu,et al.  The Maximum Box Problem and its Application to Data Analysis , 2002, Comput. Optim. Appl..

[19]  Alexander J. Smola,et al.  Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[20]  D. Hosmer,et al.  Applied Logistic Regression , 1991 .

[21]  Peter L. Hammer,et al.  Optimization in logical analysis of data , 2007 .

[22]  Martin W. P. Savelsbergh,et al.  Branch-and-Price: Column Generation for Solving Huge Integer Programs , 1998, Oper. Res..

[23]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[24]  Peter L. Hammer,et al.  Coronary Risk Prediction by Logical Analysis of Data , 2003, Ann. Oper. Res..

[25]  Peter L. Hammer,et al.  Pareto-optimal patterns in logical analysis of data , 2004, Discret. Appl. Math..

[26]  Peter L. Hammer,et al.  Accelerated algorithm for pattern detection in logical analysis of data , 2006, Discret. Appl. Math..

[27]  Peter L. Hammer,et al.  Maximum patterns in datasets , 2008, Discret. Appl. Math..

[28]  Hong Seo Ryoo,et al.  MILP approach to pattern generation in logical analysis of data , 2009, Discret. Appl. Math..

[29]  Toshihide Ibaraki,et al.  An Implementation of Logical Analysis of Data , 2000, IEEE Trans. Knowl. Data Eng..

[30]  P. Hammer,et al.  Ovarian cancer detection by logical analysis of proteomic data , 2004, Proteomics.