Preparatory Data Analysis

Preparatory data analyses (data screening) are conducted before a main analysis to assess the fit between the data and the assumptions of that main analysis. Different main analyses have different assumptions that vary in importance; violation of some assumptions can lead to the wrong inferential conclusion (and a potential failure of replication) while violation of others yields an analysis that is correct as far as it goes, but misses certain additional relationships in the data. Assumptions that are often relevant for continuous variables are normality of sampling distributions, pairwise linearity, absence of outliers and collinearity, independence of errors, and homoscedasticity; these are evaluated by both graphical and statistical methods. When violation is detected, variables are often transformed or an alternative analytic strategy is employed. Relevant issues in the choice of when and how to screen are the level of measurement of the variables, whether the design produces grouped or ungrouped data, whether cases provide a single response or more than one response, and whether the variables themselves or the residuals of analysis are screened. Keywords: assumptions; collinearity; distributions; errors; homoscedasticity; outliers; residuals; screening; transformation