An enhanced Monte Carlo outlier detection method

Outlier detection is crucial in building a highly predictive model. In this study, we proposed an enhanced Monte Carlo outlier detection method by establishing cross‐prediction models based on determinate normal samples and analyzing the distribution of prediction errors individually for dubious samples. One simulated and three real datasets were used to illustrate and validate the performance of our method, and the results indicated that this method outperformed Monte Carlo outlier detection in outlier diagnosis. After these outliers were removed, the value of validation by Kovats retention indices and the root mean square error of prediction decreased from 3.195 to 1.655, and the average cross‐validation prediction error decreased from 2.0341 to 1.2780. This method helps establish a good model by eliminating outliers. © 2015 Wiley Periodicals, Inc.

[1]  Peter Filzmoser,et al.  Locally centred Mahalanobis distance: a new distance measure with salient features towards outlier detection. , 2013, Analytica chimica acta.

[2]  Qing-Song Xu,et al.  Modeling based on subspace orthogonal projections for QSAR and QSPR research , 2008 .

[3]  P. Rousseeuw,et al.  A fast algorithm for the minimum covariance determinant estimator , 1999 .

[4]  S. Morgan,et al.  Outlier detection in multivariate analytical chemical data. , 1998, Analytical chemistry.

[5]  Chang-Tien Lu,et al.  Outlier Detection , 2008, Encyclopedia of GIS.

[6]  Christos Faloutsos,et al.  Outlier detection by example , 2011, Journal of Intelligent Information Systems.

[7]  Bell Telephone,et al.  ROBUST ESTIMATES, RESIDUALS, AND OUTLIER DETECTION WITH MULTIRESPONSE DATA , 1972 .

[8]  G. V. Kass,et al.  Location of Several Outliers in Multiple-Regression Data Using Elemental Sets , 1984 .

[9]  Dong-Sheng Cao,et al.  A new strategy of outlier detection for QSAR/QSPR , 2009, J. Comput. Chem..

[10]  A. Katritzky,et al.  QSPR correlation and predictions of GC retention indexes for methyl-branched hydrocarbons produced by insects. , 2000, Analytical chemistry.

[11]  Y. Kissin,et al.  Gas chromatographic analysis of alkyl-substituted paraffins , 1986 .

[12]  David W. Scott The New S Language , 1990 .

[13]  B. Rosner Percentage Points for a Generalized ESD Many-Outlier Procedure , 1983 .

[14]  Dong-Sheng Cao,et al.  Model population analysis for variable selection , 2010 .

[15]  Hongdong Li,et al.  Toward better QSAR/QSPR modeling: simultaneous outlier detection and variable selection using distribution of model features , 2011, J. Comput. Aided Mol. Des..

[16]  F. E. Grubbs Procedures for Detecting Outlying Observations in Samples , 1969 .

[17]  Ulrich R. Bernier,et al.  Elution Patterns from Capillary GC for Methyl-Branched Alkanes , 1998, Journal of Chemical Ecology.

[18]  David L. Woodruff,et al.  Identification of Outliers in Multivariate Data , 1996 .

[19]  W. Stefansky Rejecting Outliers in Factorial Designs , 1972 .

[20]  E. Lloyd Statistical Theory and Methodology in Science and Engineering , 1961 .