The Analysis and Interpretation of Multivariate Data for Social Scientists

Based on a longtime course for master's level students at the London School of Economics and Politics, where the authors are based, this text concentrates on the multivariate methods so useful to social science problems involving correlational rather than causal relationships. Chapters with application examples and further readings cover data preliminaries, cluster analysis, multidimensional scaling, correspondence analysis, principal components analysis, factor analysis, and latent variable methods. While mathematical demands are minimal, these methods require use of a computer software package; an auxiliary website supplies data sets and code for use with SPSS.

[1]  Peter Gerner-Smidt,et al.  Submitting articles to the BMJ , 2003, BMJ : British Medical Journal.

[2]  R. Collins,et al.  Blood pressure, stroke, and coronary heart disease Part 1, prolonged differences in blood pressure: prospective observational studies corrected for the regression dilution bias , 1990, The Lancet.

[3]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[4]  W Vach,et al.  Biased estimation of the odds ratio in case-control studies due to the use of ad hoc methods of correcting for missing values for confounding variables. , 1991, American journal of epidemiology.

[5]  Måns Rosén,et al.  Mortality, severe morbidity, and injury in children living with single parents in Sweden: a population-based study , 2003, The Lancet.

[6]  Re Waller,et al.  Principles of Exposure Measurement in Epidemiology , 1994 .

[7]  W. Winkelstein,et al.  AGE TREND OF MORTALITY FROM CORONARY ARTERY DISEASE IN WOMEN AND OBSERVATIONS ON THE REPRODUCTIVE PATTERNS OF THOSE AFFECTED. , 1964, American heart journal.

[8]  W D Dupont,et al.  Risk factors for breast cancer in women with proliferative breast disease. , 1985, The New England journal of medicine.

[9]  Joakim Dillner,et al.  Herpes simplex virus and risk of cervical cancer: a longitudinal, nested case-control study in the nordic countries. , 2002, American journal of epidemiology.

[10]  Francesco Forastiere,et al.  Silicosis and lung function decrements among female ceramic workers in Italy. , 2002, American journal of epidemiology.

[11]  S Greenland,et al.  On sample-size and power calculations for studies using confidence intervals. , 1988, American journal of epidemiology.

[12]  S W Lagakos,et al.  Effects of mismodelling and mismeasuring explanatory variables on tests of their association with a response variable. , 1988, Statistics in medicine.

[13]  W. G. Cochran Some Methods for Strengthening the Common χ 2 Tests , 1954 .

[14]  J. Robins Data, Design, and Background Knowledge in Etiologic Inference , 2001, Epidemiology.

[15]  D. Collett,et al.  Modelling Binary Data. , 1994 .

[16]  S Wacholder,et al.  Practical considerations in choosing between the case-cohort and nested case-control designs. , 1991, Epidemiology.

[17]  S L Zeger,et al.  On the use of concordant pairs in matched case-control studies. , 1988, Biometrics.

[18]  W. Raynor,et al.  Diet, serum cholesterol, and death from coronary heart disease. The Western Electric study. , 1981, The New England journal of medicine.

[19]  C B Begg,et al.  The search for cancer risk factors: when can we stop looking? , 2001, American journal of public health.

[20]  M. Friedman,et al.  A Predictive Study of Coronary Heart Disease: The Western Collaborative Group Study , 1964 .

[21]  T Sato Estimation of a common risk ratio in stratified case-cohort studies. , 1992, Statistics in medicine.

[22]  R. Tibshirani,et al.  Association between cellular-telephone calls and motor vehicle collisions. , 1997, The New England journal of medicine.

[23]  P. V. Rao,et al.  Applied Survival Analysis: Regression Modeling of Time to Event Data , 2000 .

[24]  R. Horwitz,et al.  An empirical demonstration of Berkson's bias , 1979 .

[25]  Robert Tibshirani,et al.  Is Using a Car Phone like Driving Drunk , 1997 .

[26]  N P Jewell,et al.  Statistical analysis of HIV infectivity based on partner studies. , 1990, Biometrics.

[27]  C. la Vecchia,et al.  Coffee and cancer: a review of epidemiological studies, 1990-1999. , 2000, European journal of cancer prevention : the official journal of the European Cancer Prevention Organisation.

[28]  P. Armitage Tests for Linear Trends in Proportions and Frequencies , 1955 .

[29]  D A Savitz,et al.  Maternal stress and preterm birth. , 2003, American journal of epidemiology.

[30]  Illtyd Trethowan Causality , 1938 .

[31]  S. Goodman,et al.  Evidence and scientific research. , 1988, American journal of public health.

[32]  Cyrus R. Mehta,et al.  Computing an Exact Confidence Interval for the Common Odds Ratio in Several 2×2 Contingency Tables , 1985 .

[33]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .

[34]  D. Hosmer,et al.  A comparison of goodness-of-fit tests for the logistic regression model. , 1997, Statistics in medicine.

[35]  W Winkelstein Spontaneous abortion and coronary heart disease. , 1995, Journal of clinical epidemiology.

[36]  S Greenland,et al.  Avoiding power loss associated with categorization and ordinal scores in dose-response and trend analysis. , 1995, Epidemiology.

[37]  N P Jewell,et al.  Small-sample bias of point estimators of the odds ratio from matched sets. , 1984, Biometrics.

[38]  J. Manson,et al.  Male pattern baldness and coronary heart disease: the Physicians' Health Study. , 2000, Archives of internal medicine.

[39]  W Insull,et al.  Statistical design of the Women's Health Trial. , 1988, Controlled clinical trials.

[40]  D Clayton,et al.  Sampling strategies in nested case-control studies. , 1994, Environmental health perspectives.

[41]  Joseph Waksberg,et al.  Sampling Methods for Random Digit Dialing , 1978 .

[42]  D. Lauderdale Nutritional Epidemiology, 2nd Edition , 1999 .

[43]  H Morgenstern,et al.  Matching in epidemiologic studies: validity and efficiency considerations. , 1981, Biometrics.

[44]  K. Larntz Small-Sample Comparisons of Exact Levels for Chi-Squared Goodness-of-Fit Statistics , 1978 .

[45]  J. Neuhaus Bias and efficiency loss due to misclassified responses in binary regression , 1999 .

[46]  Burton H. Singer,et al.  Recursive partitioning in the health sciences , 1999 .

[47]  Marvin Zelen,et al.  The analysis of several 2× 2 contingency tables , 1971 .

[48]  J. Robins,et al.  Semiparametric Efficiency in Multivariate Regression Models with Missing Data , 1995 .

[49]  N. Jewell,et al.  Some surprising results about covariate adjustment in logistic regression models , 1991 .

[50]  M R Segal,et al.  Extending the elements of tree-structured regression , 1995, Statistical methods in medical research.

[51]  H. J. Arnold Introduction to the Practice of Statistics , 1990 .

[52]  Stanley Lemeshow,et al.  Sampling of Populations: Methods and Applications , 1991 .

[53]  Ralph B. D'Agostino,et al.  The Appropriateness of Some Common Procedures for Testing the Equality of Two Independent Binomial Populations , 1988 .

[54]  S Greenland,et al.  Adjustment of risk ratios in case-base studies (hybrid epidemiologic designs). , 1986, Statistics in medicine.

[55]  N Breslow,et al.  Estimators of the Mantel-Haenszel variance consistent in both sparse data and large-strata limiting models. , 1986, Biometrics.

[56]  Adrián González Aguirre Environmental tobacco smoke may not kill , 2003, BMJ.

[57]  W. D. Ray 4. Modelling Survival Data in Medical Research , 1995 .

[58]  L A Kalish Reducing mean squared error in the analysis of pair-matched case-control studies. , 1990, Biometrics.

[59]  J. Robins,et al.  Marginal Structural Models and Causal Inference in Epidemiology , 2000, Epidemiology.

[60]  H. Morgenstern,et al.  Confounding in health research. , 2001, Annual review of public health.

[61]  W. Haenszel,et al.  Statistical aspects of the analysis of data from retrospective studies of disease. , 1959, Journal of the National Cancer Institute.

[62]  R. Tibshirani,et al.  Generalized Additive Models , 1991 .

[63]  S Greenland,et al.  On the need for the rare disease assumption in case-control studies. , 1982, American journal of epidemiology.

[64]  Bryan Langholz,et al.  Risk set sampling in epidemiologic cohort studies , 1996 .

[65]  S Greenland,et al.  When will nondifferential misclassification of an exposure preserve the direction of a trend? , 1994, American journal of epidemiology.

[66]  P. Prandoni,et al.  A Case-Control Study , 2022 .

[67]  James M. Robins,et al.  Causal diagrams for epidemiologic research. , 1999 .

[68]  A G Babiker,et al.  Floating absolute risk: an alternative to relative risk in survival and case-control analysis avoiding an arbitrary reference group. , 1991, Statistics in medicine.

[69]  Werner Vach,et al.  Logistic Regression with Missing Values in the Covariates , 1994 .

[70]  B. Everitt,et al.  Statistical methods for rates and proportions , 1973 .

[71]  David R. Cox,et al.  Regression models and life tables (with discussion , 1972 .

[72]  G. Belle Statistical rules of thumb , 2002 .

[73]  M LeBlanc,et al.  A review of tree-based prognostic models. , 1995, Cancer treatment and research.

[74]  Bryan Langholz,et al.  Counter-matching: A stratified nested case-control sampling method , 1995 .

[75]  R. Pyke,et al.  Logistic disease incidence models and case-control studies , 1979 .

[76]  P. Royston,et al.  Regression using fractional polynomials of continuous covariates: parsimonious parametric modelling. , 1994 .

[77]  S. Newman Biostatistical Methods in Epidemiology , 2001 .

[78]  B Rosner,et al.  Correction of logistic regression relative risk estimates and confidence intervals for measurement error: the case of multiple covariates measured with error. , 1990, American journal of epidemiology.

[79]  Steve Selvin,et al.  Statistical Analysis of Epidemiologic Data , 1991 .

[80]  W. Kannel,et al.  Comparison of prevalence, case history and incidence data in assessing the potency of risk factors in coronary heart disease. , 1966, American journal of epidemiology.

[81]  D. Thomas,et al.  Biological models and statistical interactions: an example from multistage carcinogenesis. , 1981, International journal of epidemiology.

[82]  S. Goodman,et al.  p values, hypothesis tests, and likelihood: implications for epidemiology of a neglected historical debate. , 1993, American journal of epidemiology.

[83]  M Feychting,et al.  Occupational and Residential Magnetic Field Exposure and Leukemia and Central Nervous System Tumors , 1997, Epidemiology.

[84]  J. Robins,et al.  Analysis of semi-parametric regression models with non-ignorable non-response. , 1997, Statistics in medicine.

[85]  James M. Robins,et al.  Causal Inference from Complex Longitudinal Data , 1997 .

[86]  B. Macmahon,et al.  Coffee and pancreatic cancer (Chapter 2) , 1986, The New England journal of medicine.

[87]  B. Macmahon,et al.  Coffee and cancer of the pancreas. , 1981, The New England journal of medicine.

[88]  Eric R. Ziegel,et al.  An Introduction to Generalized Linear Models , 2002, Technometrics.

[89]  J. Robins The control of confounding by intermediate variables. , 1989, Statistics in medicine.

[90]  J K McLaughlin,et al.  Selection of controls in case-control studies. III. Design options. , 1992, American journal of epidemiology.

[91]  K A Schulman,et al.  The effect of race and sex on physicians' recommendations for cardiac catheterization. , 1999, The New England journal of medicine.

[92]  T Holford,et al.  A tree-based method of analysis for prospective studies. , 1996, Statistics in medicine.

[93]  P. Simpson,et al.  Statistical methods in cancer research , 2001, Journal of surgical oncology.

[94]  M. Bracken,et al.  Tree-based, two-stage risk factor analysis for spontaneous abortion. , 1996, American journal of epidemiology.

[95]  Werner Vach,et al.  Logistic regression with incompletely observed categorical covariates: A comparison of three approaches , 1993 .

[96]  J. Robins,et al.  Estimation of Regression Coefficients When Some Regressors are not Always Observed , 1994 .

[97]  K Y Liang,et al.  Longitudinal data analysis for discrete and continuous outcomes. , 1986, Biometrics.

[98]  J Halpern,et al.  27-year mortality in the Western Collaborative Group Study: construction of risk groups by recursive partitioning. , 1991, Journal of clinical epidemiology.

[99]  B. Kirkwood,et al.  Case-control designs in the study of common diseases: updates on the demise of the rare disease assumption and the choice of sampling scheme for controls. , 1990, International journal of epidemiology.

[100]  J BERKSON,et al.  Limitations of the application of fourfold table analysis to hospital data. , 1946, Biometrics.

[101]  A. Lilienfeld,et al.  Occurrence of pregnancy, abortion, and artificial menopause among women with coronary artery disease: a preliminary study. , 1958, Journal of chronic diseases.

[102]  B Herman,et al.  Multivariate logistic analysis of risk factors for stroke in Tilburg, The Netherlands. , 1983, American journal of epidemiology.

[103]  S Greenland,et al.  Alternative models for ordinal logistic regression. , 1994, Statistics in medicine.

[104]  L. L. Kupper,et al.  In defense of matching. , 1982, American journal of epidemiology.

[105]  Debra T. Silverman,et al.  Selection of controls in case-control studies. II. Types of controls. , 1992, American journal of epidemiology.

[106]  W. Thompson,et al.  Effect modification and the limits of biological inference from epidemiologic data. , 1991, Journal of clinical epidemiology.

[107]  R. Tibshirani,et al.  Interpretation and bias in case-crossover studies. , 1997, Journal of clinical epidemiology.

[108]  S. Greenland,et al.  Correcting for misclassification in two-way tables and matched-pair studies. , 1983, International journal of epidemiology.

[109]  Nicholas P. Jewell,et al.  On the Bias of Commonly Used Measures of Association for 2 x 2 Tables , 1986 .

[110]  D M Kutvirt,et al.  Media coverage of coffee study has little effect on coffee consumption. , 1982, The New England journal of medicine.

[111]  B H Margolin,et al.  Analyses for binomial data, with application to the fluctuation test for mutagenicity. , 1981, Biometrics.

[112]  J. Cornfield,et al.  A method of estimating comparative rates from clinical data; applications to cancer of the lung, breast, and cervix. , 1951, Journal of the National Cancer Institute.

[113]  Mark Woodward,et al.  Epidemiology: Study Design and Data Analysis , 1999 .

[114]  J. Berger,et al.  Testing a Point Null Hypothesis: The Irreconcilability of P Values and Evidence , 1987 .

[115]  D R Ragland,et al.  Coronary heart disease mortality in the Western Collaborative Group Study. Follow-up experience of 22 years. , 1988, American journal of epidemiology.

[116]  Nitin R. Patel,et al.  Computing Distributions for Exact Logistic Regression , 1987 .

[117]  A. Scott,et al.  Fitting regression models to case-control data by maximum likelihood , 1997 .

[118]  B Rosner,et al.  Correction of logistic regression relative risk estimates and confidence intervals for systematic within-person measurement error. , 2006, Statistics in medicine.

[119]  R. Johnson,et al.  Tonsillectomy history in Hodgkin's disease. , 1972, The New England journal of medicine.

[120]  S. Cole,et al.  Fallibility in estimating direct effects. , 2002, International journal of epidemiology.

[121]  C Rouquette,et al.  [Epidemiologic research]. , 1970, Bulletin de l'Institut national de la sante et de la recherche medicale.

[122]  Laurence L. George,et al.  The Statistical Analysis of Failure Time Data , 2003, Technometrics.

[123]  M. Kendall Statistical Methods for Research Workers , 1937, Nature.

[124]  R. Brand,et al.  Coronary heart disease in Western Collaborative Group Study. Final follow-up experience of 8 1/2 years. , 1975, JAMA.

[125]  J. E. Lane-Claypon,et al.  A Further Report on Cancer of the Breast with Special Reference to its Associated Antecedent Conditions. , 1926 .

[126]  M. Kramer,et al.  Respiratory Distress Syndrome in Second-Born Versus First-Born Twins , 1988 .

[127]  Sander Greenland,et al.  An overview of relations among causal modelling methods. , 2002, International journal of epidemiology.

[128]  James J Schlesselman Case-Control Studies: Design, Conduct, Analysis , 1982 .

[129]  D. Trichopoulos,et al.  INDUCED ABORTION AND SECONDARY INFERTILITY , 1976, British journal of obstetrics and gynaecology.

[130]  David M. Rocke Analysis of Experiments With Missing Data , 1987 .

[131]  J. Robins,et al.  Identifiability and Exchangeability for Direct and Indirect Effects , 1992, Epidemiology.

[132]  D. Carmelli,et al.  Obesity and 33‐Year Follow‐up for Coronary Heart Disease and Cancer Mortality , 1997, Epidemiology.

[133]  Graham A. Colditz,et al.  Risk factors for breast cancer according to family history of breast cancer , 1996 .

[134]  M. Graffar [Modern epidemiology]. , 1971, Bruxelles medical.

[135]  M Dosemeci,et al.  Does nondifferential misclassification of exposure always bias a true effect toward the null value? , 1990, American journal of epidemiology.

[136]  P. Diggle,et al.  Analysis of Longitudinal Data. , 1997 .

[137]  R. L. Prentice,et al.  A case-cohort design for epidemiologic cohort studies and disease prevention trials , 1986 .

[138]  R B D'Agostino,et al.  A comparison of logistic regression to decision-tree induction in a medical domain. , 1993, Computers and biomedical research, an international journal.

[139]  M. Halloran,et al.  Causal Inference in Infectious Diseases , 1995, Epidemiology.

[140]  Tosiya Sato,et al.  Maximum likelihood estimation of the risk ratio in case-cohort studies , 1992 .

[141]  D. Clayton,et al.  Statistical Models in Epidemiology , 1993 .

[142]  D B Rubin,et al.  Multiple imputation in health-care databases: an overview and some applications. , 1991, Statistics in medicine.

[143]  S Greenland,et al.  A critical look at methods for handling missing covariates in epidemiologic regression analyses. , 1995, American journal of epidemiology.

[144]  James M. Robins,et al.  Unified Methods for Censored Longitudinal Data and Causality , 2003 .

[145]  D. Ruppert,et al.  Measurement Error in Nonlinear Models , 1995 .

[146]  B Langholz,et al.  Counter-matching in studies of gene-environment interaction: efficiency and feasibility. , 2001, American journal of epidemiology.

[147]  Graham A. Colditz,et al.  Menopause and Heart Disease , 1990 .