The meaning of significant mean group differences for biomarker discovery

Over the past decade, biomarker discovery has become a key goal in psychiatry to aid in the more reliable diagnosis and prognosis of heterogeneous psychiatric conditions and the development of tailored therapies. Nevertheless, the prevailing statistical approach is still the mean group comparison between “cases” and “controls,” which tends to ignore within-group variability. In this educational article, we used empirical data simulations to investigate how effect size, sample size, and the shape of distributions impact the interpretation of mean group differences for biomarker discovery. We then applied these statistical criteria to evaluate biomarker discovery in one area of psychiatric research—autism research. Across the most influential areas of autism research, effect size estimates ranged from small (d = 0.21, anatomical structure) to medium (d = 0.36 electrophysiology, d = 0.5, eye-tracking) to large (d = 1.1 theory of mind). We show that in normal distributions, this translates to approximately 45% to 63% of cases performing within 1 standard deviation (SD) of the typical range, i.e., they do not have a deficit/atypicality in a statistical sense. For a measure to have diagnostic utility as defined by 80% sensitivity and 80% specificity, Cohen’s d of 1.66 is required, with still 40% of cases falling within 1 SD. However, in both normal and nonnormal distributions, 1 (skewness) or 2 (platykurtic, bimodal) biologically plausible subgroups may exist despite small or even nonsignificant mean group differences. This conclusion drastically contrasts the way mean group differences are frequently reported. Over 95% of studies omitted the “on average” when summarising their findings in their abstracts (“autistic people have deficits in X”), which can be misleading as it implies that the group-level difference applies to all individuals in that group. We outline practical approaches and steps for researchers to explore mean group comparisons for the discovery of stratification biomarkers.

[1]  Jonathan D. Rosenblatt Multivariate revisit to “sex beyond the genitalia” , 2016, Proceedings of the National Academy of Sciences.

[2]  Alicia R. Martin,et al.  Predicting Polygenic Risk of Psychiatric Disorders , 2019, Biological Psychiatry.

[3]  Sakae Takahashi,et al.  Heterogeneity of schizophrenia: Genetic and symptomatic factors , 2013, American journal of medical genetics. Part B, Neuropsychiatric genetics : the official publication of the International Society of Psychiatric Genetics.

[4]  J. Morton,et al.  Causal modeling: A structural approach to developmental psychopathology. , 1995 .

[5]  Jian Wang,et al.  Quantifying factors for the success of stratified medicine , 2011, Nature Reviews Drug Discovery.

[6]  Paul Vos,et al.  Inference and Prediction , 2018 .

[7]  J. Li Effect size measures in a two-independent-samples case with nonnormal and nonhomogeneous data , 2016, Behavior research methods.

[8]  Michael V. Lombardo,et al.  Big data approaches to decomposing heterogeneity across the autism spectrum , 2018, bioRxiv.

[9]  Vincent Frouin,et al.  The EU-AIMS Longitudinal European Autism Project (LEAP): design and methodologies to identify and validate stratification biomarkers for autism spectrum disorders , 2017, Molecular Autism.

[10]  Grant T. Harris,et al.  Comparing Effect Sizes in Follow-Up Studies: ROC Area, Cohen's d, and r , 2005, Law and human behavior.

[11]  A. Meyer-Lindenberg,et al.  Machine Learning for Precision Psychiatry: Opportunities and Challenges. , 2017, Biological psychiatry. Cognitive neuroscience and neuroimaging.

[12]  P. F. Kauff Group , 2000, Elegant Design.

[13]  Eric W. Klingemier,et al.  A Meta-Analysis of Gaze Differences to Social and Nonsocial Information Between Individuals With and Without Autism. , 2017, Journal of the American Academy of Child and Adolescent Psychiatry.

[14]  K. Kendler,et al.  Levels of explanation in psychiatric and substance use disorders: implications for the development of an etiologically based nosology , 2012, Molecular Psychiatry.

[15]  Cathy J. Price,et al.  The impact of sample size on the reproducibility of voxel-based lesion-deficit mappings , 2018, Neuropsychologia.

[16]  J. Cnossen,et al.  The Accuracy of Risk Scores in Predicting Ovarian Malignancy: A Systematic Review , 2009, Obstetrics and gynecology.

[17]  J. D. den Boer,et al.  Platelet serotonin levels in pervasive developmental disorders and mental retardation: diagnostic group differences, within-group distribution, and behavioral correlates. , 2004, Journal of the American Academy of Child and Adolescent Psychiatry.

[18]  John P. Rice,et al.  Identification of common genetic risk variants for autism spectrum disorder , 2019, Nature Genetics.

[19]  Yu Sun Chung,et al.  A meta-analysis of mentalizing impairments in adults with schizophrenia and autism spectrum disorder. , 2014, Schizophrenia bulletin.

[20]  T. Insel,et al.  Toward the future of psychiatric diagnosis: the seven pillars of RDoC , 2013, BMC Medicine.

[21]  Antonia Hamilton,et al.  Recognition of Emotions in Autism: A Formal Meta-Analysis , 2013, Journal of autism and developmental disorders.

[22]  D. Hill,et al.  The Qualification of an Enrichment Biomarker for Clinical Trials Targeting Early Stages of Parkinson’s Disease , 2019, Journal of Parkinson's disease.

[23]  A. Fett,et al.  Patterns of Nonsocial and Social Cognitive Functioning in Adults With Autism Spectrum Disorder: A Systematic Review and Meta-analysis , 2019, JAMA psychiatry.

[24]  Bruce Thompson,et al.  If Statistical Significance Tests are Broken/Misused, What Practices Should Supplement or Replace Them? , 1999 .

[25]  Martin Guha Handbook of Research Methods in Experimental Psychology , 2003 .

[26]  T. Insel The NIMH Research Domain Criteria (RDoC) Project: precision medicine for psychiatry. , 2014, The American journal of psychiatry.

[27]  Barbara Shinn-Cunningham,et al.  Meta-analysis and systematic review of the literature characterizing auditory mismatch negativity in individuals with autism , 2018, Neuroscience & Biobehavioral Reviews.

[28]  Danilo Bzdok,et al.  Inference and Prediction Diverge in Biomedicine , 2020, Patterns.

[29]  J. McPartland,et al.  Atypicality of the N170 Event-Related Potential in Autism Spectrum Disorder: A Meta-analysis. , 2017, Biological psychiatry. Cognitive neuroscience and neuroimaging.

[30]  I. Hickie,et al.  Autism spectrum disorders: a meta-analysis of executive function , 2017, Molecular Psychiatry.

[31]  James C McPartland,et al.  Considerations in biomarker development for neurodevelopmental disorders. , 2016, Current opinion in neurology.

[32]  T. Insel,et al.  Wesleyan University From the SelectedWorks of Charles A . Sanislow , Ph . D . 2010 Research Domain Criteria ( RDoC ) : Toward a New Classification Framework for Research on Mental Disorders , 2018 .

[33]  Conor Liston,et al.  Causes and Consequences of Diagnostic Heterogeneity in Depression: Paths to Discovering Novel Biological Depression Subtypes , 2020, Biological Psychiatry.

[34]  Brian A. Nosek,et al.  Power failure: why small sample size undermines the reliability of neuroscience , 2013, Nature Reviews Neuroscience.

[35]  Andrew T. Drysdale,et al.  Resting-state connectivity biomarkers define neurophysiological subtypes of depression , 2016, Nature Medicine.

[36]  Jacob Cohen Statistical Power Analysis for the Behavioral Sciences , 1969, The SAGE Encyclopedia of Research Design.

[37]  Conor Liston,et al.  Dissecting diagnostic heterogeneity in depression by integrating neuroimaging and genetics , 2020, Neuropsychopharmacology.

[38]  Jan K. Buitelaar,et al.  Attention-deficit/hyperactivity disorder , 2015, Nature Reviews Disease Primers.

[39]  Laurent Mottron,et al.  Autism spectrum heterogeneity: fact or artifact? , 2020, Molecular Psychiatry.

[40]  Jennifer Fedor,et al.  Cortical and subcortical brain morphometry differences between patients with autism spectrum disorders (ASD) and healthy individuals across the lifespan: results from the ENIGMA-ASD working group , 2017 .

[41]  John Ruscio,et al.  Variance Heterogeneity in Published Psychological Research A Review and a New Index , 2012 .

[42]  Declan G. M. Murphy,et al.  Identification and validation of biomarkers for autism spectrum disorders , 2015, Nature Reviews Drug Discovery.

[43]  Robert T. Schultz,et al.  Evaluation of the Social Motivation Hypothesis of Autism: A Systematic Review and Meta-analysis , 2018, JAMA psychiatry.

[44]  Hao-Ting Wang,et al.  Finding the needle in high-dimensional haystack: A tutorial on canonical correlation analysis , 2018, ArXiv.

[45]  Chris Leptak,et al.  What evidence do we need for biomarker qualification? , 2017, Science Translational Medicine.

[46]  D. DeMets,et al.  Biomarkers and surrogate endpoints: Preferred definitions and conceptual framework , 2001, Clinical pharmacology and therapeutics.

[47]  R. Motzer,et al.  A case for the use of receiver operating characteristic analysis of potential clinical efficacy biomarkers in advanced renal cell carcinoma , 2015, Future oncology.

[48]  Roger E. Kirk,et al.  The Importance of Effect Magnitude , 2008 .

[49]  Andrew Brand,et al.  The Precision of Effect Size Estimation From Published Psychological Research , 2016, Psychological reports.

[50]  D. X. Freedman,et al.  Studies on 5-hydroxyindole metabolism in autistic and other mentally retarded children. , 1961, The Journal of pediatrics.

[51]  T. Micceri The unicorn, the normal curve, and other improbable creatures. , 1989 .

[52]  Brian Kirkpatrick,et al.  Schizophrenia heterogeneity revisited: Clinical, cognitive, and psychosocial correlates of statistically-derived negative symptoms subgroups. , 2018, Journal of psychiatric research.