Developing machine-learning regression model with Logical Analysis of Data (LAD)

Abstract This paper proposes a regression model based on Logical Analysis of Data (LAD). LAD is known as a combinatorial Boolean supervised data mining technique for pattern generation. It is used mainly for classification problems, and has demonstrated high accuracy compared to other classification techniques. In this paper, we extend the use of LAD to deal with supervised data with continuous responses. We derive a LAD regression model (LADR). Three discretization methods that transform the values of the response into a set of thresholds are tested. At each threshold, LAD analyzes the data as a two-class classification problem and extracts the prescriptive patterns for each class. LADR regression uses the generated patterns from the original data by using cbmLAD software to fit a numerical continuous dependent response. Therefore, a normalized regression model with only binary independent variables is obtained. LADR has been applied for six datasets and obtains better results compared with the linear regression (LR), support vector regression (SVR), Decision Tree Regression (DTR), Random Forest (RF), and Polynomial Regression (PolyR). The performance is evaluated by the Mean Square Error (MSE), Coefficient of Determination ( R 2 ), and Mean Absolute Error (MAE) based on a 10-fold cross validation.

[1]  Phillip Ein-Dor,et al.  Attributes of the performance of central processing units: a relative performance prediction model , 1987, CACM.

[2]  Soumaya Yacout,et al.  Rogue components: their effect and control using logical analysis of data , 2012, J. Intell. Manuf..

[3]  Soumaya Yacout Fault detection and diagnosis for condition based maintenance using the Logical Analysis of data , 2010, The 40th International Conference on Computers & Indutrial Engineering.

[4]  Soumaya Yacout,et al.  LAD-CBM; new data processing tool for diagnosis and prognosis in condition-based maintenance , 2012, J. Intell. Manuf..

[5]  Drew Conway,et al.  Machine Learning for Hackers , 2012 .

[6]  M Kharbach,et al.  Multivariate statistical process control in product quality review assessment - A case study. , 2017, Annales pharmaceutiques francaises.

[7]  C. García,et al.  Collinearity diagnostic applied in ridge estimation through the variance inflation factor , 2016 .

[8]  Trevor Hastie,et al.  An Introduction to Statistical Learning , 2013, Springer Texts in Statistics.

[9]  Richard A. Johnson,et al.  Applied Multivariate Statistical Analysis , 1983 .

[10]  Soumaya Yacout,et al.  Fault diagnosis in industrial chemical processes using interpretable patterns based on Logical Analysis of Data , 2018, Expert Syst. Appl..

[11]  Amit Mitra,et al.  Statistical Quality Control , 2002, Technometrics.

[12]  Soumaya Yacout,et al.  Fault diagnosis in power transformers using multi-class logical analysis of data , 2014, J. Intell. Manuf..

[13]  Vadim V. Lozin,et al.  Recent advances in the theory and practice of Logical Analysis of Data , 2019, Eur. J. Oper. Res..

[14]  Shu-Hsien Liao,et al.  Data mining techniques and applications - A decade review from 2000 to 2011 , 2012, Expert Syst. Appl..

[15]  Aouni A. Lakis,et al.  Diagnosis of rotor bearings using logical analysis of data , 2011 .

[16]  W. Art Chaovalitwongse,et al.  Multi-pattern generation framework for logical analysis of data , 2017, Ann. Oper. Res..

[17]  Luís Torgo,et al.  Regression by Classification , 1996, SBIA.

[18]  Luís Torgo,et al.  Regression Using Classification Algorithms , 1997, Intell. Data Anal..

[19]  Hong Seo Ryoo,et al.  Compact MILP models for optimal and Pareto-optimal LAD patterns , 2012, Discret. Appl. Math..

[20]  Chee Peng Lim,et al.  Improving K-means clustering with enhanced Firefly Algorithms , 2019, Appl. Soft Comput..

[21]  Juan Carlos Corrales,et al.  How to Address the Data Quality Issues in Regression Models: A Guided Process for Data Cleaning , 2018, Symmetry.

[22]  G. Büchi,et al.  Smart factory performance and Industry 4.0 , 2020, Technological Forecasting and Social Change.

[23]  Anabela Afonso,et al.  Overview of Friedman’s Test and Post-hoc Analysis , 2015, Commun. Stat. Simul. Comput..

[24]  Xi Ma,et al.  A model‐free approach to reduce the effect of autocorrelation on statistical process control charts , 2018, Journal of Chemometrics.

[25]  Peter L. Hammer,et al.  Logical analysis of data—An overview: From combinatorial optimization to medical applications , 2006, Ann. Oper. Res..

[26]  D. Hawkins Multivariate quality control based on regression-adjusted variables , 1991 .

[27]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[28]  Klaus-Dieter Thoben,et al.  Machine learning in manufacturing: advantages, challenges, and applications , 2016 .

[29]  Soumaya Yacout,et al.  Pattern‐based prognostic methodology for condition‐based maintenance using selected and weighted survival curves , 2017, Qual. Reliab. Eng. Int..

[30]  Toshihide Ibaraki,et al.  An Implementation of Logical Analysis of Data , 2000, IEEE Trans. Knowl. Data Eng..

[31]  R. Salmerón,et al.  Collinearity: revisiting the variance inflation factor in ridge regression , 2015 .

[32]  José Bento Sterman Ferraz,et al.  Multicollinearity in genetic effects for weaning weight in a beef cattle composite population , 2011 .

[33]  Pierre Lemaire,et al.  Extensions of Logical Analysis of Data for growth hormone deficiency diagnoses , 2011, Ann. Oper. Res..

[34]  Linda Lee Ho,et al.  Effect of neglecting autocorrelation in regression EWMA charts for monitoring count time series , 2018, Qual. Reliab. Eng. Int..

[35]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[36]  Mamta Mittal,et al.  Clustering approaches for high‐dimensional databases: A review , 2019, WIREs Data Mining Knowl. Discov..

[37]  Rita Peñabaena-Niebles,et al.  Support vector machine in statistical process monitoring: a methodological and analytical review , 2017 .