Evaluation of histological findings with severity grade, to analyze toxicology in-vivo studies

In-vivo toxicological studies are characterized by multiple primary endpoints with quite different scales. Whereas guidelines and publications provide various statistical tests for normally distributed endpoints (such as organ weights) and proportions (such as tumor rates), few approaches are available for graded histopathological findings, such as 0, +, ++, +++. This represents a basic contradiction of the statistical analysis because these graded findings sometimes show a high predictive value for potential toxic effects. Here we discuss different methods comparatively, especially from the viewpoints of i) designs for very small sample sizes and ii) interpretability by toxicologists. A new approach is recommended where a simultaneous test is performed over all class combinations of score levels, such as (0, +) vs (++, +++). Corresponding R code is provided by way of a data example.

[1]  E. Brunner,et al.  Win odds: An adaptation of the win ratio to include ties , 2021, Statistics in medicine.

[2]  C. Dunnett A Multiple Comparison Procedure for Comparing Several Treatments with a Control , 1955 .

[3]  Bülent Altunkaynak,et al.  npordtests: An R Package of Nonparametric Tests for Equality of Location Against Ordered Alternatives , 2020, R J..

[4]  Kurt Hornik,et al.  Implementing a Class of Permutation Tests: The coin Package , 2008 .

[5]  L. Hothorn,et al.  Statistical analysis of the hen's egg test for micronucleus induction (HET-MN assay). , 2013, Mutation research.

[6]  L. Hothorn Claiming trend in toxicological and pharmacological dose-response studies: an overview on statistical methods and related R-Software , 2020, 2007.09631.

[7]  T. Springer,et al.  Statistical analysis of histopathological endpoints , 2014, Environmental toxicology and chemistry.

[8]  K. Weber,et al.  Historical control data for hematology parameters obtained from toxicity studies performed on different Wistar rat strains: Acceptable value ranges, definition of severity degrees, and vehicle effects , 2020 .

[9]  Daniel Morton,et al.  Best Practices for Reporting Pathology Interpretations within GLP Toxicology Studies , 2006, Toxicologic pathology.

[10]  C. Ritz,et al.  The Tukey trend test: Multiplicity adjustment using multiple marginal models , 2021, Biometrics.

[11]  Ludwig A. Hothorn,et al.  nparcomp: An R Software Package for Nonparametric Multiple Comparisons and Simultaneous Confidence Intervals , 2015 .

[12]  Edgar Brunner,et al.  Rank-based multiple test procedures and simultaneous confidence intervals , 2012 .

[13]  P. Armitage Tests for Linear Trends in Proportions and Frequencies , 1955 .

[14]  Daniel J Patrick,et al.  Use of Severity Grades to Characterize Histopathologic Changes , 2018, Toxicologic pathology.

[15]  Charlotte Vogel Simultaneous inference for the comparison of overdispersed multinomial data , 2018 .

[16]  J. Ward,et al.  Proliferative and Nonproliferative Lesions of the Rat and Mouse Hepatobiliary System , 2010, Toxicologic pathology.

[17]  Torsten Hothorn,et al.  Most Likely Transformations: The mlt Package , 2020, Journal of Statistical Software.

[18]  Ludwig A. Hothorn,et al.  Evaluation of Toxicological Studies Using a Nonparametric Shirley-Type Trend Test for Comparing Several Dose Levels with a Control Group , 2012 .

[19]  E. Mcconnell,et al.  Aerosols of synthetic amorphous silica do not induce fibrosis in lungs after inhalation: Pathology working group review of histopathological specimens from a subchronic 13-week inhalation toxicity study in rats , 2018 .

[20]  T. Hothorn,et al.  Simultaneous Inference in General Parametric Models , 2008, Biometrical journal. Biometrische Zeitschrift.

[21]  K. Kimura,et al.  The effects on the endocrine system under hepatotoxicity induction by phenobarbital and di(2-ethylhexyl)phthalate in intact juvenile male rats. , 2019, The Journal of toxicological sciences.

[22]  Daniel Gerhard,et al.  Simultaneous confidence intervals for comparisons of several multinomial samples , 2017, Comput. Stat. Data Anal..

[23]  D. A. Williams,et al.  A test for differences between treatment means when several dose levels are compared with a zero dose control. , 1971, Biometrics.

[24]  R. Christensen,et al.  Cumulative Link Models for Ordinal Regression with the R Package ordinal , 2018 .

[25]  J W Tukey,et al.  Testing the statistical certainty of a response to increasing doses of a drug. , 1985, Biometrics.

[26]  Ludwig A. Hothorn,et al.  Closed testing procedures for treatment-versus-control comparisons and multiple correlated endpoint , 2021, 2103.07661.

[27]  T. Hothorn,et al.  Count transformation models , 2020, Methods in Ecology and Evolution.

[28]  Yasuhiko Ohta,et al.  Thyroid Histopathology Assessments for the Amphibian Metamorphosis Assay to Detect Thyroid-active Substances , 2009, Toxicologic pathology.

[29]  K. Hornik,et al.  A Lego System for Conditional Inference , 2006 .