There has been an explosion of elegant and increasingly sophisticated multiple comparison procedures (MCPs) for handling multiple endpoints in clinical trials over the past 10 years [1–5]. The explosion is predicated on the concept of strong control, which is satisfied if the probability of rejecting any (i.e., one or more) true null hypothesis among the universe of primary and prespecified secondary hypotheses (sometimes even the tertiary hypotheses) is at most 2.5% (one-sided). The desire of strong control is rooted in the belief that it could lead to more commercially favorable labels or at least placing a sponsor in a better position to negotiate the inclusion of some important secondary (and even tertiary) endpoints in a product label. An unintended consequence of strong control implemented in the previous fashion is the blurring of the lines between primary, secondary, and tertiary endpoints. The latter led Snapinn and Jiang [6] to conclude that it is time to let go of these designations and simply define a set of endpoints for which strong control is guaranteed and that any significant result within this set is equally valid. Although we can certainly understand the logic behind Snapinn and Jiang’s conclusion from the statistical perspective, getting rid of the primary-secondary-tertiary designation ignores the clinical implications of these endpoints and how an intervention is generally assessed. For example, a clinician is usually interested in secondary endpoints if a new treatment appears to have an effect on the primary endpoint(s). Not surprisingly, clinicians often want to know the treatment effect on certain secondary endpoints, especially those related to clinical outcome, regardless of their membership in the set proposed by Snapinn and Jiang. At a European Statistical Meeting on Subgroup Analyses organized by European Federation of Statisticians in the Pharmaceutical Industry on 30 Nov 2012, Rob Hemmings [7] commented on the importance of bringing all scientific evidence to bear on decision making and that in many instances it was appropriate to give greater weight to biological/pharmacological plausibility and replication of evidence, including external evidence on related products, than to individual p-values. Even though Hemmings’ comment was made in the context of subgroup analysis, which is another source of multiplicity, we feel the comment is equally applicable to secondary endpoints. In a recent presentation on the update of the European Medicines Agency (EMA) multiplicity guideline, Benda [8] commented that clinical assessment often ignores design. So, although statisticians are busy developing and implementing the next generation of MCPs that offer strong control over multiple endpoints, there has been relatively little self-examination in the statistical literature [9, 10] on the relevance of applying stringent strong control to secondary endpoints. The decision by EMA to update its guidance document [11], and hold public workshops [12], highlights that there remains some controversy amongst both the statistical and clinical community. The rigid adoption of strong control would result in many standard or seemingly reasonable approaches no longer considered acceptable by statisticians. We include some of them in Table I; in these examples, type-I error rates will not be strongly controlled if it turns out the null hypothesis, for at least one primary endpoint, is false. In some cases, even if strong control were felt necessary, the increase in type 1 error rate is marginal. For example, when considering group sequential designs, if a secondary endpoint was tested at a one-sided 5% significance level the maximum type 1 error rate is 8% [13,14] and furthermore, this increase is negligible except for a very narrow set of combinations of primary to secondary endpoint (high) correlations and true treatment effects. So, is the complexity of a sophisticated MCP to secondary endpoints a worthwhile endeavor in such a case?
[1]
L. Hothorn,et al.
Simultaneous confidence intervals on multivariate non‐inferiority
,
2013,
Statistics in medicine.
[2]
Lingyun Liu,et al.
Testing a Primary and a Secondary Endpoint in a Group Sequential Design
,
2010,
Biometrics.
[3]
Frank Bretz,et al.
Hierarchical testing of multiple endpoints in group‐sequential trials
,
2010,
Statistics in medicine.
[4]
Gonzalo Durán Pacheco,et al.
Multiple Testing Problems in Pharmaceutical Statistics
,
2009
.
[5]
W. Ahmad.
Ticagrelor versus Clopidogrel in Patients with Acute Coronary Syndromes.
,
2009
.
[6]
O. Guilbaud,et al.
A recycling framework for the construction of Bonferroni‐based multiple tests
,
2009,
Statistics in medicine.
[7]
W. Brannath,et al.
A graphical approach to sequentially rejective multiple test procedures
,
2009,
Statistics in medicine.
[8]
Sue-Jane Wang,et al.
Some Controversial Multiple Testing Problems in Regulatory Applications
,
2009,
Journal of biopharmaceutical statistics.
[9]
R. Muirhead,et al.
Multiple Co-primary Endpoints: Medical and Statistical Solutions: A Report from the Multiple Endpoints Expert Team of the Pharmaceutical Research and Manufacturers of America
,
2007
.
[10]
Xin Wang,et al.
Stepwise Gatekeeping Procedures in Clinical Trial Applications
,
2006,
Biometrical journal. Biometrische Zeitschrift.
[11]
A. Tamhane,et al.
Erratum: Stepwise gatekeeping procedures in clinical trial applications (Biometrical Journal (2006) 48:6 (984-991))
,
2006
.
[12]
Qi Jiang,et al.
Analysis of multiple endpoints in clinical trials: it's time for the designations of primary, secondary and tertiary to go
,
2011,
Pharmaceutical statistics.
[13]
Richard J. Cook,et al.
Multiplicity Considerations in the Design and Analysis of Clinical Trials
,
1996
.