论文信息 - Detecting and Dealing with Outliers in Univariate and Multivariate Contexts.

Detecting and Dealing with Outliers in Univariate and Multivariate Contexts.

Because multivariate statistics are increasing in popularity with social science researchers, the challenge of detecting multivariate outliers warrants attention. Outliers are defined as cases which, in regression analyses, generally lie more than three standard deviations from Yhat and therefore distort statistics. There are, however, some outliers that do not distort statistics when they are on the mean of Yhat lines. In univariate analyses, finding outliers can be accomplished using Casewise Diagnostics in the Statistical Package for the Social Sciences (SPSS) version 9.0, which as a three standard deviation default that can be changed easily by the researcher. In bivariate and multivariate analyses, finding outliers more than three standard deviations from Yhat is not as easy. Casewise Diagnostics will detect outliers of "Y"; however, in multivariate analyses, statistics can be distorted by a case lying within the arbitrary three standard deviations because it is said to be exerting so much influence or leverage on the regression line that, in fact, the regression line is distorted. There are two popular ways of detecting this leverage, through distance and influence calculations. The most popular statistic for detecting outliers using distance calculations is Mahalanobis. Several other ways of detecting leverage in multivariate cases are available in SPSS 9.0. Once a researcher has identified a case as being a possible outlier, then the choices are to find out if there has been an error in recording the data, or if the outlier is truly an outlier. it can be argued that there are always going to be outliers in the population as a whole, and this is an argument for keeping the score, because it reflects something natural about the general population. If the researcher decides to drop the case, then the researcher should report it and offer reasons why. (Contains 10 figures, 3 tables, and 20 references.) (Author/SLD) Reproductions supplied by EDRS are the best that can be made from the original document.

Bettie Caroline Wiggins | B. C. Wiggins

[1] A. Fielding. Sensitivity Analysis in Linear Regression , 1990 .

[2] Michele G. Jarrell,et al. A Comparison of Two Procedures, the Mahalanobis Distance and the Andrews-Pregibon Statistic, for Identifying Multivariate Outliers. , 1992 .

[3] David A. Belsley,et al. Regression Analysis and its Application: A Data-Oriented Approach.@@@Applied Linear Regression.@@@Regression Diagnostics: Identifying Influential Data and Sources of Collinearity , 1981 .

[4] B. Tabachnick,et al. Using Multivariate Statistics , 1983 .

[5] F. J. Anscombe,et al. Rejection of Outliers , 1960 .

[6] J. Stevens,et al. Applied multivariate statistics for the social sciences, 4th ed. , 2002 .

[7] Victoria P. Evans,et al. Strategies for Detecting Outliers in Regression Analysis: An Introductory Primer. , 1999 .

[8] Vic Barnett,et al. Outliers in Statistical Data , 1980 .

[9] Paul R. Yarnold,et al. Reading and Understanding Multivariate Statistics , 1995 .

[10] J. Stevens. Applied Multivariate Statistics for the Social Sciences , 1986 .

[11] Leland Wilkinson,et al. Statistical Methods in Psychology Journals Guidelines and Explanations , 2005 .

[12] L. Fish. Why Multivariate Methods are Usually Vital , 1988 .

[13] Eric Serdahl. An Introduction to Graphical Analysis of Residual Scores and Outlier Detection in Bivariate Least Squares Regression Analysis. , 1996 .