Process-Monitoring-for-Quality — Big Models

Abstract Process Monitoring for Quality (PMQ) is a big data-driven quality philosophy aimed at defect detection (through binary classification) and empirical knowledge discovery. It was originally developed to solve a complex manufacturing quality problem. It is founded on Big Models, a predictive modeling paradigm based on machine learning, statistics and optimization, that includes a learning aspect that requires many models to be developed to find the final model. When dealing with big data, the data structure is not known in advance; therefore, there is no a priori distinction between learning algorithms, and a plethora of options to choose from. The learning scheme of Big Models is described, which is based on several well known learning algorithms with the capacity to effectively solve a wide spectrum of binary classification problems. The main challenges of manufacturing pattern recognition problems are discussed and addressed to provide a strong foundation to the Big Models learning paradigm. Finally, two defect detection case studies are presented with highly unbalanced data derived from real manufacturing systems to validate the proposal.

[1]  David R. Anderson,et al.  Model selection bias and Freedman’s paradox , 2010 .

[2]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[3]  Huan Liu,et al.  Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution , 2003, ICML.

[4]  Aman Kumar Sharma,et al.  A Comparative Study of Classification Algorithms for Spam Email Data Analysis , 2011 .

[5]  Ferat Sahin,et al.  A survey on feature selection methods , 2014, Comput. Electr. Eng..

[6]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[7]  Glenn Fung,et al.  A Feature Selection Newton Method for Support Vector Machine Classification , 2004, Comput. Optim. Appl..

[8]  Sylvain Arlot,et al.  A survey of cross-validation procedures for model selection , 2009, 0907.4728.

[9]  Ruchika Malhotra,et al.  A systematic review of machine learning techniques for software fault prediction , 2015, Appl. Soft Comput..

[10]  Richard Demo Souza,et al.  A Survey of Machine Learning Techniques Applied to Self-Organizing Cellular Networks , 2017, IEEE Communications Surveys & Tutorials.

[11]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[12]  Tae-Hyung Kim,et al.  Feature selection for manufacturing process monitoring using cross-validation , 2013 .

[13]  Ismail Mohamad,et al.  Standardization and Its Effects on K-Means Clustering Algorithm , 2013 .

[14]  Pierre Dupont,et al.  Ensemble Logistic Regression for Feature Selection , 2011, PRIB.

[15]  David H. Wolpert,et al.  The Lack of A Priori Distinctions Between Learning Algorithms , 1996, Neural Computation.

[16]  Carlos A. Escobar,et al.  Big Data-Driven Manufacturing—Process-Monitoring-for-Quality Philosophy , 2017 .

[17]  Ronald D. Moen,et al.  Clearing up myths about the Deming cycle and seeing how it keeps evolving , 2010 .

[18]  Rubén Morales-Menéndez,et al.  Machine Learning and Pattern Recognition Techniques for Information Extraction to Improve Production Control and Design Decisions , 2017, ICDM.

[19]  David M. W. Powers,et al.  Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation , 2011, ArXiv.

[20]  Trevor Hastie,et al.  Model Assessment and Selection , 2009 .

[21]  Ian Davidson,et al.  An Ensemble Technique for Stable Learners with Performance Bounds , 2004, AAAI.

[22]  Sotiris B. Kotsiantis,et al.  Supervised Machine Learning: A Review of Classification Techniques , 2007, Informatica.

[23]  Duc Truong Pham,et al.  Machine-learning techniques and their applications in manufacturing , 2005 .

[24]  Marko Robnik-Sikonja,et al.  Theoretical and Empirical Analysis of ReliefF and RReliefF , 2003, Machine Learning.

[25]  João Francisco Valiati,et al.  Document-level sentiment classification: An empirical comparison between SVM and ANN , 2013, Expert Syst. Appl..

[26]  Venu Govindaraju,et al.  Review of Classifier Combination Methods , 2008, Machine Learning in Document Analysis and Recognition.

[27]  Sargur N. Srihari,et al.  Decision Combination in Multiple Classifier Systems , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[28]  R. Shibata Statistical aspects of model selection , 1989 .

[29]  Sayan Mukherjee,et al.  Feature Selection for SVMs , 2000, NIPS.

[30]  Pedro M. Domingos A few useful things to know about machine learning , 2012, Commun. ACM.

[31]  Ruben Morales-Menendez,et al.  Machine learning techniques for quality control in high conformance manufacturing environment , 2018 .

[32]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[33]  Asha Gowda Karegowda,et al.  Combining Akaike's Information Criterion (AIC) and the Golden-Section Search Technique to find Optimal Numbers of K-Nearest Neighbors , 2010 .

[34]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[35]  Paul S. Bradley,et al.  Feature Selection via Concave Minimization and Support Vector Machines , 1998, ICML.

[36]  Christine W. Chan,et al.  Artificial intelligence for monitoring and supervisory control of process systems , 2007, Eng. Appl. Artif. Intell..

[37]  Inci Batmaz,et al.  A review of data mining applications for quality improvement in manufacturing industry , 2011, Expert Syst. Appl..

[38]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[39]  Ruben Morales-Menendez,et al.  Process-Monitoring-for-Quality—Applications , 2018 .

[40]  Robert Tibshirani,et al.  1-norm Support Vector Machines , 2003, NIPS.

[41]  Khairullah Khan,et al.  A Review of Machine Learning Algorithms for Text-Documents Classification , 2010 .

[42]  Honglak Lee,et al.  Efficient L1 Regularized Logistic Regression , 2006, AAAI.

[43]  Paul Terry,et al.  Application of the GA/KNN method to SELDI proteomics data , 2004, Bioinform..

[44]  James D. Malley,et al.  Predictor correlation impacts machine learning algorithms: implications for genomic studies , 2009, Bioinform..

[45]  Max Kuhn,et al.  Applied Predictive Modeling , 2013 .

[46]  Zixiang Xiong,et al.  Optimal number of features as a function of sample size for various classification rules , 2005, Bioinform..

[47]  Nikunj C. Oza,et al.  Online Ensemble Learning , 2000, AAAI/IAAI.

[48]  Ruben Morales-Menendez,et al.  Process-Monitoring-for-Quality — A Model Selection Criterion , 2018 .

[49]  Lifeng Xi,et al.  A selective multiclass support vector machine ensemble classifier for engineering surface classification using high definition metrology , 2015 .

[50]  S. Imandoust,et al.  Application of K-Nearest Neighbor (KNN) Approach for Predicting Economic Events: Theoretical Background , 2013 .

[51]  J. Havel,et al.  Artificial neural networks in medical diagnosis , 2013 .

[52]  Sotiris B. Kotsiantis,et al.  Machine learning: a review of classification and combining techniques , 2006, Artificial Intelligence Review.

[53]  Vanathi Gopalakrishnan,et al.  An Overview and Evaluation of Recent Machine Learning Imputation Methods Using Cardiac Imaging Data , 2017, Data.