Statistical issues
暂无分享,去创建一个
Dedicated to the memory of Professor Joseph L Fleiss This paper will focus on factors that affect the integrity or performance of trial results. I was pleased when the NINDS used the acronym PSMB for performance and safety monitoring boards, as opposed to data and safety monitoring boards, because to me the P stressed the importance of performance, and performance means integrity. I would like to preface my remarks with reference to a concept from the field of research bioethics, the principle of clinical equipoise. This principle is used most often to justify conducting a randomized clinical trial, randomizing patients to possibly risky therapies, especially in life-threatening situations with desperately ill patients, such as the ALS community of patients. In order to justify that randomization and the risk, however minimized it might be, there must be a substantial lack of consensus in the clinical community concerning which is the optimal treatment. That of course does not preclude individual clinicians having strong opinions about the optimal therapy, but if there exists substantial lack of consensus in the clinical community on the question, then it means that some of our strong opinions must be wrong and therefore it is imperative to conduct the randomized clinical trial. This concept is well-known. What is less well known is the other leg of the principle of clinical equipoise, which is that the trial must have a clear ability to perturb the state of clinical equipoise at the end. This is where biostatistics comes in, because there are so many threats to the validity, and therefore the ability, of a clinical trial to have a definitive answer, whether it be that (a) is clearly superior to (b), or vice versa, or (a) is clearly equivalent to (b). Insufficient sample size is the key factor that destroys the validity of clinical trials, but what is less clear are the many ways in which a trial can end up having insufficient sample size, notwithstanding the optimistic calculations of the biostatisticians and clinicians alike. One of the key threats is that of crossovers: patients who are randomized to treatment (a), but who actually receive treatment (b), or vice versa. If, for example, 25% of the patients randomized to one arm receive, for whatever reason, the other therapy and similarly 25% in the second arm receive the first treatment, then the effect size, i.e., the difference in the success rates, is essentially reduced to one half the size that was originally contemplated assuming no crossovers. When you reduce the efficacy difference to one half of its original size, you need four times the number of patients to achieve the same degree of statistical certainty. Moreover, it is not enough simply to recruit, say, 10% more patients as a response to the crossover. The effects of crossovers must be anticipated clearly in advance and built into the sample size calculations; otherwise you will end up with completely inadequate results. This example may seem hypothetical, but about 20 years ago coronary artery by-pass graft (CABG) surgery was the latest development to treat stable angina, and there was a randomized trial – the Coronary Artery Surgery Study (CASS) – that was intended to test the then standard medical treatment for these patients against CABG surgery. The protocol did not specify under what circumstances a patient who was assigned to the medical arm could be crossed over to be given CABG surgery, as the doctors were allowed to cross their patients over ad libidum, and after about five years of follow-up some 25% of the patients randomized to the medical arm had been given CABG surgery and a smaller percentage, roughly 3–5% of those who were assigned to surgery, did not receive surgery and were only on maintenance while waiting for their procedures. The result of the trial was that there was no significant difference between the two arms because of the intent-to-treat analysis, which was the appropriate analysis. However, one egregious consequence was that the media inferred, incorrectly, that this trial demonstrated deferred surgery was just as good as medicine. Because of the crossover, the statistical power of that comparison, under the intent-to-treat principle, was about 30%, which means that if you did 10 CASS studies with a true effect size such as was actually found (namely, that surgery improved mortality by 1.5 percentage points), you would only expect three of those to show a statistically significant finding. There is a lesson in that tale. Furthermore, the suspicion that the patients who crossed over from the medical to the surgical arm were more sick patients, and therefore would have a higher mortality rate than would be expected for non-crossovers, is completely wrong; in the trial those who were given CABG surgery who had crossed over from the medical arm had a much better survival rate than any of the other sub-groups you might have looked at. The other lesson in that tale is that our suspicions, our hunches, our presumptions, and our assumptions are not entirely reliable. This is the justification for the intent-to-treat analysis which will be examined below. So, crossover is something to watch out for. Other forms of non-compliance, for example missing data and dropout from the trials, are other key problems and are discussed in the paper by Dr. Thompson. Other potentially biased analyses, essentially any analysis other than the intent-to-treat type of analysis, should be avoided as the primary report of a clinical trial. This includes