The problem of multiple inference in studies designed to generate hypotheses.

Epidemiologic research often involves the simultaneous assessment of associations between many risk factors and several disease outcomes. In such situations, often designed to generate hypotheses, multiple univariate hypothesis-testing is not an appropriate basis for inference. The number of true positive associations in a collection of many associations can be estimated by comparing the observed distribution of p values for the positive associations to a theoretical uniform distribution, or to the observed distribution of negative associations, or to an empiric randomization distribution. None of these approaches, however, will distinguish the true from the false positive associations. Various criteria for selecting a subset of associations to report are considered by the authors, including Bonferoni adjustment of p values, splitting the sample for searching and testing, Bayesian inference, and decision theory. The authors prefer an approach in which all associations in the data are reported, whether significant or not, followed by a ranking in order of priority for investigation using empirical Bayes techniques. Methods are illustrated by application to preliminary data from a study aimed at identifying hitherto unsuspected occupational carcinogens.

[1]  W. Haenszel,et al.  Statistical aspects of the analysis of data from retrospective studies of disease. , 1959, Journal of the National Cancer Institute.

[2]  H. Jick The discovery of drug-induced illness. , 1977, The New England journal of medicine.

[3]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[4]  D. Kodlin,et al.  Reserpine and breast cancer , 1978, Cancer.

[5]  D. Labarthe,et al.  Methodologic variation in case-control studies of reserpine and breast cancer. , 1979, Journal of chronic diseases.

[6]  N. Breslow,et al.  Statistical methods in cancer research: volume 1- The analysis of case-control studies , 1980 .

[7]  N. Breslow,et al.  Statistical methods in cancer research. Vol. 1. The analysis of case-control studies. , 1981 .

[8]  A. Feinstein,et al.  Coffee and pancreatic cancer. The problems of etiologic science and epidemiologic case-control research. , 1981, JAMA.

[9]  B. Macmahon,et al.  Coffee and cancer of the pancreas. , 1981, The New England journal of medicine.

[10]  J Siemiatycki,et al.  Preliminary report of an exposure-based, case-control monitoring system for discovering occupational carcinogens. , 1982, Teratogenesis, carcinogenesis, and mutagenesis.

[11]  L Rushton,et al.  Simultaneous inference in epidemiological studies. , 1982, International journal of epidemiology.

[12]  E. Spjøtvoll,et al.  Plots of P-values to evaluate many tests simultaneously , 1982 .

[13]  C. Morris Parametric Empirical Bayes Inference: Theory and Applications , 1983 .

[14]  D. Thomas,et al.  The problem of multiple inference in identifying point-source environmental hazards. , 1985, Environmental health perspectives.