Real-time contrasts control chart using random forests with weighted voting

We propose RTC control charts using random forests with weighted voting.F-measure, G-mean, and MCC are used as performance measures to assign proper weights.Our method detects faults more rapidly by making monitoring statistics continuous.Our method can identify where the fault occurs because tree-based classifier is used.Experiments demonstrated that our method is more effective than the existing methods. Real-time fault detection and isolation are important tasks in process monitoring. A real-time contrasts (RTC) control chart converts the process monitoring problem into a real-time classification problem and outperforms existing methods. However, the monitoring statistics of the original RTC chart are discrete; this could make the fault detection ability less efficient. To make monitoring statistics continuous, distance-based RTC control charts using support vector machines (SVM) and kernel linear discriminant analysis (KLDA) were proposed. Although the distance-based RTC charts outperformed the original RTC chart, the distance-based RTC charts have a disadvantage in that it is difficult to analyze the causes of faults when using these charts. Therefore, we propose improved RTC control charts using random forests with weighted voting. These improved RTC control charts can detect changes more rapidly by making monitoring statistics continuous; additionally, they can also analyze the causes of faults in a similar manner to the original RTC chart. Further, the improved RTC control charts alleviate the class imbalance problem by using F-measure, G-mean, and Matthews correlation coefficient (MCC) as performance measures to assign proper weights to individual classifiers. Experiments show that the proposed methods outperform the original RTC chart and are more effective than the distance-based RTC charts using SVM and KLDA.

[1]  Wei Jiang,et al.  A distance-based control chart for monitoring multivariate processes using support vector machines , 2016, Annals of Operations Research.

[2]  Panitarn Chongfuangprinya,et al.  Integration of support vector machines and control charts for multivariate process monitoring , 2011 .

[3]  Alicia Fernández,et al.  Improving Electric Fraud Detection using Class Imbalance Strategies , 2012, ICPRAM.

[4]  Marko Robnik-Sikonja,et al.  Improving Random Forests , 2004, ECML.

[5]  Juan José Rodríguez Diez,et al.  A weighted voting framework for classifiers ensembles , 2012, Knowledge and Information Systems.

[6]  Chi Zhang,et al.  A general framework for monitoring complex processes with both in-control and out-of-control information , 2015, Comput. Ind. Eng..

[7]  George C. Runger,et al.  Multivariate statistical process control with artificial contrasts , 2007 .

[8]  Jing Liu,et al.  Outlier detection on uncertain data based on local information , 2013, Knowl. Based Syst..

[9]  Zheng Liu,et al.  Normalized residual-based constant false-alarm rate outlier detection , 2016, Pattern Recognit. Lett..

[10]  Jong-Seok Lee,et al.  Shifting artificial data to detect system failures , 2015, Int. Trans. Oper. Res..

[11]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[12]  Yan-Qing Zhang,et al.  Diversified ensemble classifiers for highly imbalanced data learning and its application in bioinformatics , 2011 .

[13]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Horst Bunke,et al.  Optimization of Weights in a Multiple Classifier Handwritten Word Recognition System Using a Genetic Algorithm , 2004 .

[15]  Charles W. Champ,et al.  A multivariate exponentially weighted moving average control chart , 1992 .

[16]  R. Crosier Multivariate generalizations of cumulative sum quality-control schemes , 1988 .

[17]  Geoffrey Vining Multivariate Quality Control Procedures A. J. Hay ter , 1999 .

[18]  George C. Runger,et al.  Tuned artificial contrasts to detect signals , 2007 .

[19]  Fugee Tsung,et al.  A kernel-distance-based multivariate control chart using support vector methods , 2003 .

[20]  Mark Johnston,et al.  Evolving Diverse Ensembles Using Genetic Programming for Classification With Unbalanced Data , 2013, IEEE Transactions on Evolutionary Computation.

[21]  João Gama,et al.  Monitoring Incremental Histogram Distribution for Change Detection in Data Streams , 2008, KDD Workshop on Knowledge Discovery from Sensor Data.

[22]  George C. Runger,et al.  System Monitoring with Real-Time Contrasts , 2012 .

[23]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[24]  Wei Jiang,et al.  Real-time process monitoring using kernel distances , 2016 .

[25]  Dimitrios I. Fotiadis,et al.  Modifications of the construction and voting mechanisms of the Random Forests Algorithm , 2013, Data Knowl. Eng..

[26]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[27]  Petr Savický,et al.  Methods for multidimensional event classification: A case study using images from a Cherenkov gamma-ray telescope , 2004 .

[28]  Thomas K. Burdenski,et al.  Evaluating Univariate , Bivariate , and Multivariate Normality Using Graphical and Statistical Procedures , 2007 .

[29]  Douglas C. Montgomery,et al.  Some Current Directions in the Theory and Application of Statistical Process Monitoring , 2014 .

[30]  Gustavo E. A. P. A. Batista,et al.  Learning with Skewed Class Distributions , 2002 .

[31]  Kwok-Leung Tsui,et al.  Integration of classification algorithms and control chart techniques for monitoring multivariate processes , 2011 .

[32]  Seoung Bum Kim,et al.  One-class classification-based control charts for multivariate process monitoring , 2009 .

[33]  Mykola Pechenizkiy,et al.  Dynamic Integration with Random Forests , 2006, ECML.

[34]  A. Hillas Cerenkov light images of EAS produced by primary gamma , 1985 .

[35]  Stephen Kwek,et al.  Applying Support Vector Machines to Imbalanced Datasets , 2004, ECML.

[36]  Nitesh V. Chawla,et al.  Exploiting Diversity in Ensembles: Improving the Performance on Unbalanced Datasets , 2007, MCS.