Alternatives to statistical decision trees in regulatory (eco-)toxicological bioassays

The goal of (eco-) toxicological testing is to experimentally establish a dose or concentration–response and to identify a threshold with a biologically relevant and probably non-random deviation from “normal”. Statistical tests aid this process. Most statistical tests have distributional assumptions that need to be satisfied for reliable performance. Therefore, most statistical analyses used in (eco-)toxicological bioassays use subsequent pre- or assumption-tests to identify the most appropriate main test, so-called statistical decision trees. There are however several deficiencies with the approach, based on study design, type of tests used and subsequent statistical testing in general. When multiple comparisons are used to identify a non-random change against negative control, we propose to use robust testing, which can be generically applied without the need of decision trees. Visualization techniques and reference ranges also offer advantages over the current pre-testing approaches. We aim to promulgate the concepts in the (eco-) toxicological community and initiate a discussion for regulatory acceptance.

[1]  Jacob Lekker,et al.  The placing of plant protection products on the market , 2014 .

[2]  Regina Nuzzo,et al.  Scientific method: Statistical errors , 2014, Nature.

[3]  A. Madansky Identification of Outliers , 1988 .

[4]  F. E. Satterthwaite An approximate distribution of estimates of variance components. , 1946, Biometrics.

[5]  L. Hothorn,et al.  Statistical evaluation of mortality in long-term carcinogenicity bioassays using a Williams-type procedure. , 2012, Regulatory toxicology and pharmacology : RTP.

[6]  Robert G. D. Steel,et al.  A multiple comparison rank sum test: Treatments versus control. , 1959 .

[7]  Lisa Stryjewski,et al.  40 years of boxplots , 2010 .

[8]  S. Kobourov,et al.  Same Stats, Different Graphs (Graph Statistics and Why We Need Graph Drawings) , 2018, GD.

[9]  Thomas Jaki,et al.  Statistical evaluation of toxicological assays: Dunnett or Williams test—take both , 2013, Archives of Toxicology.

[10]  Matthew W Wheeler,et al.  Properties of Model‐Averaged BMDLs: A Study of Model Averaging in Dichotomous Response Risk Estimation , 2007, Risk analysis : an official publication of the Society for Risk Analysis.

[11]  Toxicology and carcinogenesis studies of sodium dichromate dihydrate (Cas No. 7789-12-0) in F344/N rats and B6C3F1 mice (drinking water studies). , 2008, National Toxicology Program technical report series.

[12]  Rida T. Farouki,et al.  The Bernstein polynomial basis: A centennial retrospective , 2012, Comput. Aided Geom. Des..

[13]  R. Schäfer,et al.  Ecotoxicology is not normal , 2015, Environmental Science and Pollution Research.

[14]  M. Hasler Heteroscedasticity: multiple degrees of freedom vs. sandwich estimation , 2014, Statistical Papers.

[15]  A comparison of statistical approaches for analysis of count and proportion data in ecotoxicology , 2015 .

[16]  Douglas G Altman,et al.  Analysis of continuous data from small samples , 2009, BMJ : British Medical Journal.

[17]  W. R. Schucany,et al.  Preliminary Goodness-of-Fit Tests for Normality do not Validate the One-Sample Student t , 2006 .

[18]  L. Hothorn The two-step approach—a significant ANOVA F-test before Dunnett's comparisons against a control—is not recommended , 2016 .

[19]  Evon M. O. Abu-Taieh,et al.  Comparative Study , 2020, Definitions.

[20]  Torsten Hothorn,et al.  Most Likely Transformations , 2015, 1508.06749.

[21]  L. Hothorn,et al.  Robust multiple comparisons against a control group with application in toxicology , 2019, 1905.01838.

[22]  Ludwig A Hothorn,et al.  Multiple Contrast Tests in the Presence of Heteroscedasticity , 2008, Biometrical journal. Biometrische Zeitschrift.

[23]  T. W. Anderson,et al.  Asymptotic Theory of Certain "Goodness of Fit" Criteria Based on Stochastic Processes , 1952 .

[24]  J. Shaffer Multiple Hypothesis Testing , 1995 .

[25]  David Hoffman,et al.  Statistical considerations for calculation of immunogenicity screening assay cut points. , 2011, Journal of immunological methods.

[26]  H. Geys,et al.  An assessment of the statistical methods used to analyse toxicology studies , 2011, Pharmaceutical statistics.

[27]  George W. Fitzmaurice,et al.  Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing , 2017, CHI.

[28]  W. Hoeffding,et al.  Contributions to Probability and Statistics: Essays in Honor of Harold Hotelling. , 1962 .

[29]  Christopher Gandrud,et al.  Reproducible Research with R and RStudio , 2013 .

[30]  Ludwig A Hothorn,et al.  Use compatibility intervals in regulatory toxicology. , 2020, Regulatory toxicology and pharmacology : RTP.

[31]  John D. Potter,et al.  The Lady Tasting Tea: How Statistics Revolutionized Science in the Twentieth Century , 2001, Nature Medicine.

[32]  M. Festing,et al.  Genetic variation in outbred rats and mice and its implications for toxicological screening. , 1993, Journal of experimental animal science.

[33]  Ludwig A. Hothorn,et al.  nparcomp: An R Software Package for Nonparametric Multiple Comparisons and Simultaneous Confidence Intervals , 2015 .

[34]  D. W. Zimmerman A note on preliminary tests of equality of variances. , 2004, The British journal of mathematical and statistical psychology.

[35]  L A Hothorn,et al.  Simultaneous Confidence Intervals for Ratios with Applications to the Comparison of Several Treatments with a Control , 2004, Methods of Information in Medicine.

[36]  Student,et al.  THE PROBABLE ERROR OF A MEAN , 1908 .

[37]  W. Landis,et al.  Don't be fooled—A no‐observed‐effect concentration is no substitute for a poor concentration–response experiment , 2016, Environmental toxicology and chemistry.

[38]  E. Zeiger IS I-1 A New OECD Test Guideline (TG487) : In Vitro Mammalian Cell Micronucleus Test (MNvit)(Session I: In Vitro Tests) , 2008 .

[39]  Roland Frötschl,et al.  The rat bone marrow micronucleus test: Statistical considerations on historical negative control data , 2019, Regulatory toxicology and pharmacology : RTP.

[40]  M. E. Johnson,et al.  A Comparative Study of Tests for Homogeneity of Variances, with Applications to the Outer Continental Shelf Bidding Data , 1981 .

[41]  Werner A. Stahel,et al.  Sharpening Wald-type inference in robust regression for small samples , 2011, Comput. Stat. Data Anal..

[42]  L. Hothorn,et al.  Asymptotic Simultaneous Confidence Intervals for Many-to-One Comparisons of Binary Proportions in Randomized Clinical Trials , 2009, Journal of biopharmaceutical statistics.

[43]  Ludwig A. Hothorn,et al.  Robustness Study on Williams‐ and Shirley‐Procedure, with Application in Toxicology , 1989 .

[44]  H. B. Mann,et al.  On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other , 1947 .

[45]  W. Wien,et al.  Object-oriented Computation of Sandwich Estimators , 2006 .

[46]  L. Hothorn,et al.  Boxplots for grouped and clustered data in toxicology , 2016, Archives of Toxicology.

[47]  Festing Mf,et al.  Genetic variation in outbred rats and mice and its implications for toxicological screening. , 1993 .

[48]  Satterthwaite Fe An approximate distribution of estimates of variance components. , 1946 .

[49]  Claus Thorn Ekstrøm Teaching "Instant Experience" with Graphical Model Validation Techniques. , 2014 .

[50]  F. Kluxen Scatter plotting as a simple tool to analyse relative organ to body weight in toxicological bioassays , 2019, Archives of Toxicology.

[51]  T. Hothorn,et al.  Simultaneous Inference in General Parametric Models , 2008, Biometrical journal. Biometrische Zeitschrift.

[52]  William S. Cleveland,et al.  Visualizing Data , 1993 .

[53]  M. Braga,et al.  Exploratory Data Analysis , 2018, Encyclopedia of Social Network Analysis and Mining. 2nd Ed..

[54]  Sandrine Charles,et al.  A new perspective on the Dunnett procedure: filling the gap between NOEC/LOEC and ECx concepts. , 2011, Environmental toxicology and chemistry.

[55]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[56]  V. Garovic,et al.  Reveal, Don't Conceal: Transforming Data Visualization to Improve Transparency. , 2019, Circulation.

[57]  Torsten Hothorn,et al.  Most Likely Transformations: The mlt Package , 2020, Journal of Statistical Software.

[58]  R. Dennis Cook,et al.  Detection of Influential Observation in Linear Regression , 2000, Technometrics.

[59]  N. Lazar,et al.  Moving to a World Beyond “p < 0.05” , 2019, The American Statistician.

[60]  F. Korner‐Nievergelt,et al.  The earth is flat (p > 0.05): significance thresholds and the crisis of unreplicable research , 2017, PeerJ.

[61]  Zvi Drezner,et al.  A Modified Kolmogorov–Smirnov Test for Normality , 2010, Commun. Stat. Simul. Comput..

[62]  D. A. Williams,et al.  A test for differences between treatment means when several dose levels are compared with a zero dose control. , 1971, Biometrics.

[63]  Julia Kastner,et al.  Introduction to Robust Estimation and Hypothesis Testing , 2005 .

[64]  T. Hothorn,et al.  A Robust Procedure for Comparing Multiple Means under Heteroscedasticity in Unbalanced Designs , 2010, PloS one.

[65]  Kyung-Min Lim,et al.  Analysis of Statistical Methods Currently used in Toxicology Journals , 2014, Toxicological research.

[66]  C. Dunnett A Multiple Comparison Procedure for Comparing Several Treatments with a Control , 1955 .

[67]  D. W. Zimmerman A Note on Homogeneity of Variance of Scores and Ranks , 1996 .

[68]  Marcin Kozak,et al.  What's normal anyway? Residual plots are more telling than significance tests when checking ANOVA assumptions , 2018 .

[69]  L. Hothorn Statistical evaluation of toxicological bioassays – a review , 2014 .

[70]  Leland Wilkinson,et al.  An Analytic Approximation to the Distribution of Lilliefors's Test Statistic for Normality , 1986 .

[71]  N. Lazar,et al.  The ASA Statement on p-Values: Context, Process, and Purpose , 2016 .

[72]  O. J. Dunn Multiple Comparisons among Means , 1961 .

[73]  F. Ramsey,et al.  The statistical sleuth : a course in methods of data analysis , 2002 .

[74]  L. Hothorn,et al.  Poly-k-Trend Tests for Survival Adjusted Analysis of Tumor Rates Formulated as Approximate Multiple Contrast Test , 2008, Journal of biopharmaceutical statistics.

[75]  Williams Da,et al.  A test for differences between treatment means when several dose levels are compared with a zero dose control. , 1971 .

[76]  R. Colbran,et al.  Transparency Is the Key to Quality , 2015, The Journal of Biological Chemistry.

[77]  G. Cumming The New Statistics: Why and How , 2013 .

[78]  José Cortiñas Abrahantes,et al.  Update: use of the benchmark dose approach in risk assessment , 2017, EFSA journal. European Food Safety Authority.

[79]  W. Dixon,et al.  Simplified Statistics for Small Numbers of Observations , 1951 .

[80]  J. Wheeler,et al.  Historical control data for the interpretation of ecotoxicity data: are we missing a trick? , 2019, Ecotoxicology.

[81]  K. S. Pillai,et al.  Evaluation of statistical tools used in short-term repeated dose administration toxicity studies with rodents. , 2008, The Journal of toxicological sciences.

[82]  Sander Greenland,et al.  Retire statistical significance , 2019 .

[83]  J. Bucher NTP toxicity studies of sodium dichromate dihydrate (CAS No. 7789-12-0) administered in drinking water to male and female F344/N rats and B6C3F1 mice and male BALB/c and am3-C57BL/6 mice. , 2007, Toxicity report series.

[84]  Nathaniel Isaacson,et al.  Earth Is Flat , 2018 .

[85]  Sander Greenland,et al.  Scientists rise up against statistical significance , 2019, Nature.

[86]  Luis A. Escobar,et al.  Statistical Intervals: A Guide for Practitioners , 1991 .

[87]  S. Shapiro,et al.  An Analysis of Variance Test for Normality (Complete Samples) , 1965 .

[88]  Sander Greenland,et al.  Valid P-Values Behave Exactly as They Should: Some Misleading Criticisms of P-Values and Their Resolution With S-Values , 2019, The American Statistician.

[89]  M. Kendall Statistical Methods for Research Workers , 1937, Nature.

[90]  M. Kozak Analyzing one-way experiments: a piece of cake of a pain in the neck? , 2009 .

[91]  H. Levene Robust tests for equality of variances , 1961 .

[92]  Christopher Rao,et al.  Graphs in Statistical Analysis , 2010 .

[93]  L. Hothorn,et al.  Proof of Hazard and Proof of Safety in Toxicological Studies Using Simultaneous Confidence Intervals for Differences and Ratios to Control , 2008, Journal of biopharmaceutical statistics.

[94]  C. Kohl,et al.  Enhancing the interpretation of statistical P values in toxicology studies: implementation of linear mixed models (LMMs) and standardized effect sizes (SESs) , 2015, Archives of Toxicology.

[95]  D. Cox,et al.  An Analysis of Transformations , 1964 .

[96]  Christian Ritz,et al.  A Review of Recent Advances in Benchmark Dose Methodology , 2019, Risk analysis : an official publication of the Society for Risk Analysis.

[97]  Welch Bl THE GENERALIZATION OF ‘STUDENT'S’ PROBLEM WHEN SEVERAL DIFFERENT POPULATION VARLANCES ARE INVOLVED , 1947 .

[98]  S. Goodman,et al.  Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations , 2016, European Journal of Epidemiology.

[99]  M. Bartlett Properties of Sufficiency and Statistical Tests , 1992 .

[100]  T. Hothorn,et al.  Continuous outcome logistic regression for analyzing body mass index distributions , 2017, F1000Research.

[101]  Statistical analysis for toxicity studies , 2017, Journal of toxicologic pathology.