Data-Dredging Procedures in Survey Analysis

Introduction and Summary 1. It is a commonplace of the statistical design of experiments that the hypotheses to be tested should be formulated before examining the data that are to be used to test them. Even in experimental situations, this is sometimes not possible, and in the last decade or so some progress has been made toward the development of more flexible testing procedures which allow the data to be dredged for hypotheses in certain ways. In survey analysis, which is commonly exploratory, it is rare for precise hypotheses to be formulable independently of the data. It follows that normally no precise probabilistic interpretations can validly be given to relationships found among the survey variables. In practice, this has not prevented survey practitioners from reporting probability levels as if they were precisely meaningful. Most investigators are so accustomed to making probability statements that a survey report looks naked without them, but we fear that many survey reports are wearing the Emperor's clothes. This paper offers a classification of data-dredging procedures and some comments on their use.