Bias due to Controlling aCollider : A Potentially Important Issue for Personality Research

I focus on one bias in correlational studies that has been rarely recognised because of the current taboo on discussions of causality in these studies: bias due to controlling a collider. It cannot only induce artificial correlations between statistically independent predictors but also suppress or hide real correlations between predictors. If the collider is related to selective sampling, a particularly nasty bias results. Bias due to controlling a collider may be as important as bias due to a suppressor effect. Copyright © 2012 John Wiley & Sons, Ltd. In his stimulating paper that is unfortunately sometimes hard to read, Lee (this issue) touches a taboo topic in current personality publications: causal relations among variables that describe between-person differences. For many years, authors were educated by reviewers and editors to avoid causal language because of the many pitfalls in causal interpretations of correlations. These pitfalls granted dismissing causality altogether are like throwing out the baby with the bathwater. As humans, we cannot avoid thinking in terms of causality, and therefore, tabooing this topic in publications does not prevent readers andmass media from their own causal interpretations guided by implicit rules such as ‘A correlates with B’ means ‘A causes B’ but ‘B correlates with A’ means ‘B causes A’. Although causality is a difficult concept in correlational studies, scientists should and actually can do better than this if they can be pressed to explicate the causal model, or alternative causal models, underlying their research questions. The directed acyclic graph (DAG) method described by Lee (this issue) is a valuable method of achieving such an explication (see Foster, 2010, for an excellent discussion of causality based on DAGs for developmental psychologists).My comment here focuses on a key concept in the DAG approach: the collider. Bias due to explicit control of a collider: Example from research on adaptation A collider is an outcome of two joint predictors that may be correlated or not. If one statistically controls for a collider, the resulting correlation between the predictors will be necessarily biassed. Although this bias is most often discussed only for the case where two predictors are uncorrelated such that the bias consists of a spurious correlation, the bias is in fact general: any correlation will be biassed by the adjustment. As Lee (this issue) has correctly observed, the bias is obvious but rarely noticed by researchers. For an example, let us consider data on risks and resources for adaptation of immigrant youth in Greece to the Greek culture (Motti-Stefanidi, Asendorpf, & Masten, in press). Self-efficacy is an important resource, so the association of immigrant status with self-efficacy provides important information. Do these immigrants have lower self-efficacy expectations than their Greek peers? The answer is yes (the zero-order correlation between dummy-coded immigrant status in a sample of 969 adolescent immigrant students along with their Greek classmates was .15, p< .001). In studies of immigrant adaptation, skills in the host language are often routinely controlled because they may already explain most or all effects of other predictors of adaptation (although in many cases, suppressor effects may occur because the effect of language skills on adaptation is relatively strong). In the aforementioned case, if one controls the correlation between immigrant status and self-efficacy for the ability to speak Greek, the resulting partial correlation is .03 and not significant any more. The control of Greek speaking skills induces a bias due to a collider because these skills are very likely causally influenced by both immigrant status and self-efficacy. Indeed, the respective correlations were .37, p< .001 and .23, p< .001. Thus, controlling for host language skills is highly problematic in studies of the adaptation of immigrants where a resource and/or an adaptation outcome influence these skills because in such cases one controls a collider. If one starts with explicit causal models before decisions are made on the statistical control of variables in the model, one will rarely commit this kind of erroneous over control. But if no causal analysis is made and the models involve many variables, or variables where it is not clear whether they should be considered a predictor or an outcome, researchers can easily be lost in covariation, relying on traditional routines designed for the control of certain predictor variables although they might be outcomes in the present context. Bias due to implicitly controlling a collider through sampling: Example from research on achievement If a collider is related to sampling such that the sample of participants is restricted in variation on the collider, this is equivalent to statistically controlling part of the variation of European Journal of Personality, Eur. J. Pers. (2012) Published online in Wiley Online Library (wileyonlinelibrary.com) DOI: 10.1002/per.1865 Copyright © 2012 John Wiley & Sons, Ltd. Journal Code Article ID Dispatch: 24.05.12 CE: Dorio, Lynette P E R 1 8 6 5 No. of Pages: 23 ME: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

[1]  WATER UNDER THE BRIDGE A Response to Bingham , Heywood , and White , 2006 .

[2]  G. Smith Epidemiology, epigenetics and the 'Gloomy Prospect': embracing randomness in population health research and practice. , 2011, International journal of epidemiology.

[3]  Judea Pearl,et al.  Direct and Indirect Effects , 2001, UAI.

[4]  L. Penrose,et al.  THE CORRELATION BETWEEN RELATIVES ON THE SUPPOSITION OF MENDELIAN INHERITANCE , 2022 .

[5]  Olle Melander,et al.  From noncoding variant to phenotype via SORT1 at the 1p13 cholesterol locus , 2010, Nature.

[6]  R. A. Fisher,et al.  The Genetical Theory of Natural Selection , 1931 .

[7]  P. Broderick,et al.  Chromosome 15q25 (CHRNA3-CHRNA5) Variation Impacts Indirectly on Lung Cancer Risk , 2011, PloS one.

[8]  Paul C. D. Johnson,et al.  Association Between Genetic Variants on Chromosome 15q25 Locus and Objective Measures of Tobacco Exposure , 2012, Journal of the National Cancer Institute.

[9]  M. Goddard Genomic selection: prediction of accuracy and maximisation of long term response , 2009, Genetica.

[10]  M. Tobin,et al.  Mendelian Randomisation and Causal Inference in Observational Epidemiology , 2008, PLoS medicine.

[11]  P. Visscher,et al.  GCTA: a tool for genome-wide complex trait analysis. , 2011, American journal of human genetics.

[12]  Keith A. Markus,et al.  Reflective measurement models, behavior domains, and common causes , 2013 .

[13]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[14]  Susy Macqueen,et al.  Validity , 1973, Just Algorithms.

[15]  G. Hardin,et al.  The Cybernetics of Competition: A Biologist's View of Society , 2015, Perspectives in biology and medicine.

[16]  H. Hollingworth Personality a psychological interpretation. , 1938 .

[17]  Philippe Jacquart,et al.  On making causal claims: A review and recommendations , 2010 .

[18]  Robert-Jan Palstra,et al.  HERC2 rs12913832 modulates human pigmentation by attenuating chromatin-loop formation between a long-range enhancer and the OCA2 promoter. , 2012, Genome research.

[19]  P. Holland Statistics and Causal Inference , 1985 .

[20]  Natalie M. Myres,et al.  New insights into the Tyrolean Iceman's origin and phenotype as inferred by whole-genome sequencing , 2012, Nature Communications.

[21]  P. Molenaar A Manifesto on Psychology as Idiographic Science: Bringing the Person Back Into Scientific Psychology, This Time Forever , 2004 .

[22]  K. Lamb The Genetical Theory of Natural Selection A Complete Variorum Edition , 2000 .

[23]  Philip L. F. Johnson,et al.  A Draft Sequence of the Neandertal Genome , 2010, Science.

[24]  Joseph Hilbe,et al.  Data Analysis Using Regression and Multilevel/Hierarchical Models , 2009 .

[25]  R. Punnett,et al.  The Genetical Theory of Natural Selection , 1930, Nature.

[26]  D. A. Kenny,et al.  Correlation and Causation. , 1982 .

[27]  M. Blows A tale of two matrices: multivariate approaches in evolutionary biology , 2007, Journal of evolutionary biology.

[28]  Ulrich Trautwein,et al.  Military Training and Personality Trait Development , 2012, Psychological science.

[29]  Elias Bareinboim,et al.  Transportability across studies: A formal approach , 2011 .

[30]  J. Wildgen,et al.  "Broken windows" and the risk of gonorrhea. , 2000, American journal of public health.

[31]  Achim Klenke,et al.  Probability theory - a comprehensive course , 2008, Universitext.

[32]  F. Thoemmes,et al.  Theory and Analysis of Total, Direct, and Indirect Causal Effects , 2014, Multivariate behavioral research.

[33]  Rory A. Fisher,et al.  AVERAGE EXCESS AND AVERAGE EFFECT OF A GENE SUBSTITUTION , 1941 .

[34]  D. A. Kenny,et al.  The moderator-mediator variable distinction in social psychological research: conceptual, strategic, and statistical considerations. , 1986, Journal of personality and social psychology.

[35]  Philip L. F. Johnson,et al.  Genetic history of an archaic hominin group from Denisova Cave in Siberia , 2010, Nature.

[36]  A. Caspi,et al.  The Power of Personality: The Comparative Validity of Personality Traits, Socioeconomic Status, and Cognitive Ability for Predicting Important Life Outcomes , 2007, Perspectives on psychological science : a journal of the Association for Psychological Science.

[37]  P. Costa,et al.  Revised NEO Personality Inventory (NEO-PI-R) and NEO-Five-Factor Inventory (NEO-FFI) , 1992 .

[38]  A. Krogh,et al.  Ancient human genome sequence of an extinct Palaeo-Eskimo , 2010, Nature.

[39]  R. P. McDonald,et al.  Behavior Domains in Theory and in Practice , 2003 .

[40]  L. Cronbach,et al.  Construct validity in psychological tests. , 1955, Psychological bulletin.

[41]  Tom Burr,et al.  Causation, Prediction, and Search , 2003, Technometrics.

[42]  J. Heckman,et al.  Causality and Econometrics , 2022, SSRN Electronic Journal.

[43]  D. Mackinnon Introduction to Statistical Mediation Analysis , 2008 .

[44]  J. Box Commentary: On RA Fisher’s Bateson lecture on statistical methods in genetics , 2010 .

[45]  J. Pearl The Causal Foundations of Structural Equation Modeling , 2012 .

[46]  A. Goldberger,et al.  Structural Equation Models in the Social Sciences. , 1974 .

[47]  R. McCrae Integrating the Levels of Personality , 1996 .

[48]  N. Timpson,et al.  Mendelian Randomization: Application to Cardiovascular Disease , 2012, Current Hypertension Reports.

[49]  Arthur S. Goldberger,et al.  Structural Equation Models in the Social Sciences. , 1974 .

[50]  D. Hibbs On analyzing the effects of policy interventions : Box-Jenkins and Box-Tiao vs. structural equation models , 1977 .

[51]  Bogdan Draganski,et al.  Neuroplasticity: Changes in grey matter induced by training , 2004, Nature.

[52]  Jordan B Peterson,et al.  Between facets and domains: 10 aspects of the Big Five. , 2007, Journal of personality and social psychology.

[53]  G. Wagner The character concept in evolutionary biology , 2001 .

[54]  A. Weismann The all-sufficiency of natural selection : a reply to Herbert Spencer , 1893 .

[55]  Eric D. Heggestad,et al.  Intelligence, personality, and interests: evidence for overlapping traits. , 1997, Psychological bulletin.

[56]  Peter E. Kennedy A Guide to Econometrics , 1979 .

[57]  P. White Ideas About Causation in Philosophy and Psychology , 1990 .

[58]  Stephan Ripke,et al.  Estimating the proportion of variation in susceptibility to schizophrenia captured by common SNPs , 2012, Nature Genetics.

[59]  D. Carmelli,et al.  Twenty-four year mortality in World War II US male veteran twins discordant for cigarette smoking. , 1996, International journal of epidemiology.

[60]  RA Fisher, statistician and geneticist extraordinary: a personal view. , 2003, International journal of epidemiology.

[61]  M. Rutter,et al.  Proceeding From Observed Correlation to Causal Inference: The Use of Natural Experiments , 2007, Perspectives on psychological science : a journal of the Association for Psychological Science.

[62]  Frosso Motti-Stefanidi,et al.  The adaptation and well-being of adolescent immigrants in Greek schools: A multilevel, longitudinal study of risks and resources , 2012, Development and Psychopathology.

[63]  S. Ebrahim,et al.  'Mendelian randomization': can genetic epidemiology contribute to understanding environmental determinants of disease? , 2003, International journal of epidemiology.

[64]  J. Kaprio,et al.  Twins, smoking and mortality: a 12-year prospective study of smoking-discordant twin pairs. , 1989, Social science & medicine.

[65]  P. Meehl,et al.  A funny thing happened to us on the way to the latent entities. , 1979, Journal of personality assessment.

[66]  J. Pearl The Causal Mediation Formula—A Guide to the Assessment of Pathways and Mechanisms , 2012, Prevention Science.

[67]  R. Lewontin ‘The Selfish Gene’ , 1977, Nature.

[68]  D. Lawlor,et al.  Clustered Environments and Randomized Genes: A Fundamental Distinction between Conventional and Genetic Epidemiology , 2007, PLoS medicine.

[69]  L. Cronbach The two disciplines of scientific psychology. , 1957 .

[70]  A. Cooper,et al.  Do extraverts get more bang for the buck? Refining the affective-reactivity hypothesis of extraversion. , 2012, Journal of personality and social psychology.

[71]  P. Meehl Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and the slow progress of soft psychology. , 1978 .

[72]  F. Thoemmes,et al.  A Systematic Review of Propensity Score Methods in the Social Sciences , 2011, Multivariate behavioral research.

[73]  Causal Linear Stochastic Dependencies: The Formal Theory , 1984 .

[74]  Peter C. M. Molenaar,et al.  The integrated trait–state model , 2007 .

[75]  S. Wright,et al.  The Relative Importance of Heredity and Environment in Determining the Piebald Pattern of Guinea-Pigs. , 1920, Proceedings of the National Academy of Sciences of the United States of America.

[76]  John S. Heywood,et al.  Evaluating Schools and Teachers Based On Student Performance , 1991 .

[77]  George Davey Smith,et al.  Mendelian Randomization for Strengthening Causal Inference in Observational Studies , 2010, Perspectives on psychological science : a journal of the Association for Psychological Science.

[78]  J. Pearl,et al.  EIGHT MYTHS ABOUT CAUSALITY AND STRUCTURAL EQUATION MODELS , 2013 .

[79]  M. Eichler Granger causality and path diagrams for multivariate time series , 2007 .

[80]  Timothy C. Bates,et al.  From left to right: how the personality system allows basic traits to influence politics via characteristic moral adaptations. , 2011, British journal of psychology.

[81]  Elias Bareinboim,et al.  Controlling Selection Bias in Causal Inference , 2011, AISTATS.

[82]  Kurt Gödel,et al.  On Formally Undecidable Propositions of Principia Mathematica and Related Systems , 1966 .

[83]  K. Kendler,et al.  Dependent stressful life events and prior depressive episodes in the prediction of major depression: the problem of causal inference in psychiatric epidemiology. , 2010, Archives of general psychiatry.

[84]  Simon C. Potter,et al.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls , 2007, Nature.

[85]  Angela L. Duckworth,et al.  Personality Psychology and Economics , 2011, SSRN Electronic Journal.

[86]  Giovanni Montana,et al.  Statistical methods in genetics , 2006, Briefings Bioinform..

[87]  Michael I. Jordan Graphical Models , 2003 .

[88]  George Davey Smith,et al.  Capitalizing on Mendelian randomization to assess the effects of treatments. , 2007, Journal of the Royal Society of Medicine.

[89]  H.L.J. van der Maas,et al.  A dynamical model of general intelligence: the positive manifold of intelligence by mutualism. , 2006, Psychological review.

[90]  seguindo,et al.  INFERENCE TO THE BEST EXPLANATION , 2004 .

[91]  Michael R. Genesereth,et al.  Introduction to Logic, Second Edition , 2013, Introduction to Logic.

[92]  Gerome Breen,et al.  Genetic Variation , 2020, Population Genetics with R.

[93]  Howard B. Lee,et al.  Foundations of Behavioral Research , 1973 .

[94]  A. Philip Dawid,et al.  Beware of the DAG! , 2008, NIPS Causality: Objectives and Assessment.

[95]  Brian W. Junker,et al.  Tail-measurability in monotone latent variable models , 1997 .

[96]  Wendy Johnson,et al.  Genetic and environmental influences on behavior: capturing all the interplay. , 2007, Psychological review.

[97]  Jun Zhu,et al.  Increasing the Power to Detect Causal Associations by Combining Genotypic and Expression Data in Segregating Populations , 2007, PLoS Comput. Biol..

[98]  Judea Pearl,et al.  Causal Inference , 2010 .