Rationality in psychological research: The good-enough principle.

" This article reexamines a number of methodological and procedural issues raised by Meehl (1967, 1978) that seem to question the rationality of psychological inquiry. The first issue concerns the asymmetry in theory testing between psychology and physics and the resulting paradox that, because the psychological null hypothesis is always false, increases in precision in psychology always lead to weaker tests of a theory, whereas the converse is true in physics. The second issue, related to the first, regards the slow progress observed in psychological research and the seeming unwillingness of social scientists to take seriously the Popperian requirements for intellectual honesty. We propose a good-enough principle to resolve Meehl's methodological paradox and appeal to a more powerful reconstruction of science developed by Lakatos (1978a, 1978b) to account for the actual practice of psychological researchers. From time to time every research discipline must reevaluate its method for generating and certifying knowledge. The actual practice of working scientists in a discipline must continually be subjected to severe criticism and be held accountable to standards of intellectual honesty, standards that are themselves revised in light of critical appraisal (Lakatos, 1978a). If, on a metatheoretical level, scientific methodology cannot be defended on rational grounds, then metatheory must be reconstructed so as to make science rationally justifiable. The history of science is replete with numerous such reconstructions, from the portrayal of science as being inductive and justificationist, to the more recent reconstructions favored by (naive and sophisticated) methodological falsificationists, such as Popper (1959), Lakatos (1978a), and Zahar (1973). In the last two decades psychology, too, has been subjected to criticism for its research methodology. Of increasing concern is empirical psychology's use of inferential hypothesis-testing techniques and the way in which the information derived from these procedures is used to help us make decisions about the theories under test (e.g., Bakan, 1966; Lykken, 1968; Rozeboom, 1960). In two penetrating essays, Meehl (1967, 1978) has cogently and effectively faulted the use of the traditional null-hypothesis significance test in psychological research. According to Meehl (1978, p. 817), "the almost universal reliance on merely refuting the null hypothesis as the standard method for corroborating substantive theories [in psychology] is a terrible mistake, is basically unsound, poor scientific strategy, and one of the worst things that ever happened in the history of psychology." He maintained that it leads to a methodological paradox when compared to theory testing in physics. In addition, Meehl (1978) pointed to the apparently slow progress in psychological research and the deleterious effect that null-hypothesis testing has had on the detection of progress in the accumulation of psychological knowledge. The cumulative effect of this criticism is to do nothing less than call into question the rational character of our empirical inquiries. As yet there has been no attempt to deal with the problems raised by Meehl by reconstructing the actual practice of psychologists into a logically defensible form. This is the purpose of the present article. The two articles by Meehl seem to deal with two disparate issues--null-hypothesis testing and slow progress. Both issues, however, are linked in the methodological falsificationist reconstruction of science to the necessity for scientists to agree on what experimental outcomes are to be considered as disconfirming instances. We will argue that the methodological paradox can be ameliorated with the help of a "good-enough" principle, to be proposed here, so that hypothesis testing in psychology is not rationally disadvantaged when compared to physics. We will also account for the apparent slow progress in psychological research, and we will take issue with certain (though not all) claims made by Meehl (1978) in this regard. Both the methodological and the progress issues will be resolved by an appeal to the (sophisticated) methodological falsificationist reconstruction of science developed by Lakatos (1978a), an approach with which Meehl is familiar but one he did not apply to psychology in his articles. January 1985 • American Psychologist Copyright 1985 by the American Psychological Association, Inc. 0003-066X/85/$00.75 Vol. 40, No. 1, 73-83 73 Meehl's Asymmetry Argument Let us develop Meehl's argument. It is his contention that improved measurement precision has widely different effects in psychology and physics on the success of a theory in overcoming an "observational hurdle." Perfect precision in the behavioral sciences provides an easier hurdle for theories, whereas such accuracy in physics makes it much more difficult for a theory to survive. According to the Popperian reconstruction of science (Popper, 1959), scientific theories must be continually subjected to severe tests. But if the social sciences are immanently incapable of generating such tests, if they cannot expose their theories to the strongest possible threat of refutation, even with ever-increasing measurement precision, then their claim to scientific status might reasonably be questioned. Further, according to this view of research in the social sciences, there can be no question of scientific progress based on the rational consideration of experimental outcomes. Instead, progress is more a matter of psychological conversion (Kuhn, 1962). Let us look more closely at the standard practice in psychology. On the basis of some theory T we derive the conclusion that a parameter 6 will differ for two populations. In order to examine this conclusion, we can set up a point-null hypothesis, Ho: = 0, and test this hypothesis against the predicted outcome, H~: 6 4: 0. However, it has also been recognized (Kaiser, 1960; Kimmel, 1957) that another question of interest is whether the difference is in a certain direction, and so we could instead test the directional null hypothesis, I-I~: 6 ~ 0, against the directional alternative, H*: ~ > 0. In such tests, we can make two types of errors. The Type I error would lead to rejecting Ho or H~ when they are indeed true, whereas the Type II error involves not rejecting Ho or HJ when they are false. The conventional methodology sets the Type I (or alpha) error rate at 5% and seeks to reduce the frequency of Type II errors. Such a reduction in the Type II error rate can be achieved by improving the logical structure of the experiment, reducing measurement errors, or increasing sample size. Meehl pointed out that in the behavioral sciences, because of the large number of factors affecting variables, we would never expect two populations to have literally equal means. Hence, he concluded that An earlier version of this article was read at the 1983 meeting of the American Educational Research Association. The authors are grateful to Robbie Case, Joel R. Lcvin, and Leonard Marascuilo for reading earlier drafts, and to Crescent L. Kringle for her help with the manuscript. Requests for reprints should be sent to Ronald C. Serlin, Department of Educational Psychology, University of Wisconsin, Madison, Wisconsin 53706. the point-null hypothesis is always false. With infinite precision, we would always reject Ho. This is perhaps one reason to prefer the directional null hypothesis H~. But Meehl then conducted a thought experiment in which the direction predicted by T was assigned at random. In such an experiment, T provides no logical connection to the predicted direction and so is totally without merit. Because H0 is always false, the two populations will always differ, but because the direction in H~ is assigned at random, with infinite precision we will reject HJ half of the time. Hence, Meehl concluded "that the effect of increased p r e c i s i o n . . , is to yield a probability approaching 1/2 of corroborating our substantive theory by a significance test, even i f the theory is totally without merit" (Meehl, 1967, p. 111, emphasis in original). Meehl contrasted this state of affairs with that in physics, wherein the usual situation involves the prediction of a point value. That which corresponds to the point-null hypothesis is the value flowing as a consequence of a substantive theory T. An increase in statistical power in physics has the effect of stiffening the experimental hurdle by "'decreasing the prior probability of a successful experimental outcome if the theory lacks verisimilitude, that is, precisely the reverse of the situation obtaining in the social sciences" (Meehl, 1967, p. 113). With infinite precision, and if the theory has no merit, the logical probability of it surviving such a test in physics is negligible; in the social sciences, this logical probability for H~ is one half. Perhaps another way of describing the asymmetry in hypothesis testing between psychology and physics is to note that, in psychology, the point-null hypothesis is not what is derived from a substantive theory. Rather, it is a "straw-man" competitor whose rejection we interpret as increasing the plausibility of T. In physics, on the other hand, theories that entail point-null statistical hypotheses are the very ones physicists take seriously and hope to confirm. If 0 is a predicted outcome of interest, and 0 is its logical complement, then the depiction of null and alternative statistical hypotheses in the two disciplines can be written as follows:

[1]  R. Serlin,et al.  On the Alleged Degeneration of the Kohlbergian Research Program , 1984 .

[2]  R. Campbell,et al.  Problems in the Theory of Developmental Sequences , 1983 .

[3]  Equilibration: Developing the Hard Core of the Piagetian Research Program. , 1983 .

[4]  Robert L. Campbell,et al.  Problems in the Theory of Developmental Sequences. Prerequisites and Precursors. , 1983 .

[5]  B. Puka An Interdisciplinary Treatment of Kohlberg , 1982, Ethics.

[6]  G. Glass,et al.  Meta-analysis in social research , 1981 .

[7]  Carol Gilligan,et al.  Moral development in late adolescence and adulthood: A critique and reconstruction of Kohlberg's theory. , 1980 .

[8]  James R. Rest Development in Judging Moral Issues , 1979 .

[9]  Stage Acquisition and Stage Use , 1979 .

[10]  J C Gibbs Kohlberg's moral stage theory: a Piagetian revision. , 1979, Human development.

[11]  P. Meehl Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and the slow progress of soft psychology. , 1978 .

[12]  Mark H. Bickhard,et al.  The Nature of Developmental Stages , 1978 .

[13]  James V. Bradley,et al.  Probability, decision, statistics , 1976 .

[14]  I. Lakatos Falsification and the Methodology of Scientific Research Programmes , 1976 .

[15]  W. Damon Early Conceptions of Positive Justice as Related to the Development of Logical Operations. , 1975 .

[16]  P. Urbach Progress and Degeneration in the ‘IQ Debate’ (II) , 1974, The British Journal for the Philosophy of Science.

[17]  P. Urbach Progress and Degeneration in the ‘IQ Debate’ (I)* , 1974, The British Journal for the Philosophy of Science.

[18]  E. Zahar Why did Einstein's Programme supersede Lorentz's? (I)* , 1973, The British Journal for the Philosophy of Science.

[19]  E. Zahar Why did Einstein's Programme supersede Lorentz's? (II) , 1973, The British Journal for the Philosophy of Science.

[20]  L. Kohlberg Continuities in Childhood and Adult Moral Development Revisited , 1973 .

[21]  G. William Walster,et al.  Statistical Significance as a Decision Rule , 1970 .

[22]  D. Lykken Statistical significance in psychological research. , 1968, Psychological bulletin.

[23]  I. Lakatos Changes in the Problem of Inductive Logic , 1968 .

[24]  P. Meehl Theory-Testing in Psychology and Physics: A Methodological Paradox , 1967, Philosophy of Science.

[25]  D. Bakan,et al.  The test of significance in psychological research. , 1966, Psychological bulletin.

[26]  T. Kuhn,et al.  The Structure of Scientific Revolutions. , 1964 .

[27]  W. W. Rozeboom The fallacy of the null-hypothesis significance test. , 1960, Psychological bulletin.

[28]  H. Kaiser,et al.  Directional statistical decisions. , 1960, Psychological review.

[29]  H. Kimmel,et al.  Three criteria for the use of one-tailed tests. , 1957, Psychological bulletin.

[30]  J. L. Hodges,et al.  Testing the Approximate Validity of Statistical Hypotheses , 1954 .

[31]  J. Neyman Basic Ideas and Some Recent Results of the Theory of Testing Statistical Hypotheses , 1942 .

[32]  M. Kendall Statistical Methods for Research Workers , 1937, Nature.

[33]  L. Frank The Society for Research in Child Development , 1935 .

[34]  M. Kendall,et al.  The Logic of Scientific Discovery. , 1959 .

[35]  J. Wolfowitz,et al.  An Introduction to the Theory of Statistics , 1951, Nature.