Multicriteria variable selection for classification of production batches

In many industrial processes hundreds of noisy and correlated process variables are collected for monitoring and control purposes. The goal is often to correctly classify production batches into classes, such as good or failed, based on the process variables. We propose a method for selecting the best process variables for classification of process batches using multiple criteria including classification performance measures (i.e., sensitivity and specificity) and the measurement cost. The method applies Partial Least Squares (PLS) regression on the training set to derive an importance index for each variable. Then an iterative classification/elimination procedure using k-Nearest Neighbor is carried out. Finally, Pareto analysis is used to select the best set of variables and avoid excessive retention of variables. The method proposed here consistently selects process variables important for classification, regardless of the batches included in the training data. Further, we demonstrate the advantages of the proposed method using six industrial datasets.

[1]  P. A. Taylor,et al.  The impact of missing measurements on PCA and PLS prediction and monitoring applications , 2006 .

[2]  Constantin Zopounidis,et al.  A comparison of nearest neighbours, discriminant and logit models for auditing decisions , 2007, Intell. Syst. Account. Finance Manag..

[3]  Heidi A. Taboada,et al.  Data Clustering of Solutions for Multiple Objective System Reliability Optimization Problems , 2007 .

[4]  S. L. Albin,et al.  Manufacturing start-up problem solved by mixed-integer quadratic programming and multivariate statistical modelling , 2002 .

[5]  S. Wold,et al.  PLS-regression: a basic tool of chemometrics , 2001 .

[6]  Constantinos Goutis A fast method to compute orthogonal loadings partial least squares , 1997 .

[7]  Paul Geladi,et al.  Interactive variable selection (IVS) for PLS. Part 1: Theory and algorithms , 1994 .

[8]  Nur Evin Özdemirel,et al.  Manufacturing lead time estimation using data mining , 2006, Eur. J. Oper. Res..

[9]  Mónica García-Melón,et al.  Valuation of urban industrial land: An analytic network process approach , 2008, Eur. J. Oper. Res..

[10]  Yannis Manolopoulos,et al.  Support vector machines, Decision Trees and Neural Networks for auditor selection , 2008, J. Comput. Methods Sci. Eng..

[11]  A. Höskuldsson Variable and subset selection in PLS regression , 2001 .

[12]  Chao-Ton Su,et al.  Feature selection for the SVM: An application to hypertension diagnosis , 2008, Expert Syst. Appl..

[13]  Ya-Ju Fan,et al.  On the Time Series $K$-Nearest Neighbor Classification of Abnormal Brain Activity , 2007, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[14]  Anders Berglund,et al.  PCA and PLS with very large data sets , 2005, Comput. Stat. Data Anal..

[15]  Jean-Pierre Gauchi,et al.  Selecting both latent and explanatory variables in the PLS1 regression model , 2003 .

[16]  David G. Stork,et al.  Pattern Classification , 1973 .

[17]  Hongzhe Li,et al.  In Response to Comment on "Network-constrained regularization and variable selection for analysis of genomic data" , 2008, Bioinform..

[18]  Constantin Zopounidis,et al.  Multicriteria decision support methodologies for auditing decisions: The case of qualified audit reports in the UK , 2007, Eur. J. Oper. Res..

[19]  B. Kowalski,et al.  Partial least-squares regression: a tutorial , 1986 .

[20]  R. Manne Analysis of two partial-least-squares algorithms for multivariate calibration , 1987 .

[21]  Michel Tenenhaus,et al.  Régression PLS et applications , 1995 .

[22]  Ralph E. Steuer,et al.  Multiple criteria decision making combined with finance: A categorized bibliographic study , 2003, Eur. J. Oper. Res..

[23]  Chi-Hyuck Jun,et al.  A data mining approach to process optimization without an explicit quality function , 2007 .

[24]  Adisa Azapagic,et al.  Life cycle Assessment and its Application to Process Selection, Design and Optimisation , 1999 .

[25]  José Manuel Prada-Sánchez,et al.  Parametric, non‐parametric and mixed approaches to prediction of sparsely distributed pollution incidents: a case study , 1997 .

[26]  H. Abdi Partial Least Squares (PLS) Regression. , 2003 .

[27]  Jacob Zahavi,et al.  Using simulated annealing to optimize the feature selection problem in marketing applications , 2006, Eur. J. Oper. Res..

[28]  Heidi A. Taboada,et al.  Multi-objective scheduling problems: Determination of pruned Pareto sets , 2008 .

[29]  David E. Johnson,et al.  Maximizing Text-Mining Performance , 1999 .

[30]  Frederick W. Williams,et al.  Multi-criteria fire detection systems using a probabilistic neural network , 2000 .

[31]  Selwyn Piramuthu Evaluating feature selection methods for learning in data mining applications , 2004, Eur. J. Oper. Res..

[32]  Michel J. Anzanello,et al.  Selecting the best variables for classifying production batches into two quality levels , 2009 .

[33]  M. Forina,et al.  Iterative predictor weighting (IPW) PLS: a technique for the elimination of useless predictors in regression problems , 1999 .

[34]  Jih-Jeng Huang,et al.  Optimal fuzzy multi-criteria expansion of competence sets using multi-objectives evolutionary algorithms , 2006, Expert Syst. Appl..

[35]  Jean-Pierre Gauchi,et al.  Comparison of selection methods of explanatory variables in PLS regression with application to manufacturing process data , 2001 .

[36]  Nong Ye,et al.  The Handbook of Data Mining , 2003 .

[37]  Constantin Zopounidis,et al.  Multicriteria classification and sorting methods: A literature review , 2002, Eur. J. Oper. Res..

[38]  S. Wold,et al.  Some recent developments in PLS modeling , 2001 .

[39]  Constantin Zopounidis,et al.  On the construction of mutual fund portfolios: A multicriteria methodology and an application to the Greek market of equity mutual funds , 2005, Eur. J. Oper. Res..

[40]  Son Doan,et al.  An efficient feature selection using multi-criteria in text categorization , 2004, Fourth International Conference on Hybrid Intelligent Systems (HIS'04).

[41]  Toshiyuki Sueyoshi,et al.  DEA-Discriminant Analysis: Methodological comparison among eight discriminant analysis approaches , 2006, Eur. J. Oper. Res..

[42]  Xiaonan Li,et al.  Operations research and data mining , 2008, Eur. J. Oper. Res..

[43]  J. Macgregor,et al.  Analysis of multiblock and hierarchical PCA and PLS models , 1998 .

[44]  Constantin Zopounidis,et al.  A comparison of nearest neighbours, discriminant and logit models for auditing decisions: Research Articles , 2007 .

[45]  A. Höskuldsson PLS regression methods , 1988 .

[46]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.