Introduction After several decades’ development, meta-analysis has become the pillar of evidence-based medicine. However, heterogeneity is still the threat to the validity and quality of such studies. Currently, Q and its descendant I2 (I square) tests are widely used as the tools for heterogeneity evaluation. The core mission of this kind of test is to identify data sets from similar populations and exclude those are from different populations. Although Q and I2 are used as the default tool for heterogeneity testing, the work we present here demonstrates that the robustness of these two tools is questionable. Methods and Findings We simulated a strictly normalized population S. The simulation successfully represents randomized control trial data sets, which fits perfectly with the theoretical distribution (experimental group: p = 0.37, control group: p = 0.88). And we randomly generate research samples Si that fits the population with tiny distributions. In short, these data sets are perfect and can be seen as completely homogeneous data from the exactly same population. If Q and I2 are truly robust tools, the Q and I2 testing results on our simulated data sets should not be positive. We then synthesized these trials by using fixed model. Pooled results indicated that the mean difference (MD) corresponds highly with the true values, and the 95% confidence interval (CI) is narrow. But, when the number of trials and sample size of trials enrolled in the meta-analysis are substantially increased; the Q and I2 values also increase steadily. This result indicates that I2 and Q are only suitable for testing heterogeneity amongst small sample size trials, and are not adoptable when the sample sizes and the number of trials increase substantially. Conclusions Every day, meta-analysis studies which contain flawed data analysis are emerging and passed on to clinical practitioners as “updated evidence”. Using this kind of evidence that contain heterogeneous data sets leads to wrong conclusion, makes chaos in clinical practice and weakens the foundation of evidence-based medicine. We suggest more strict applications of meta-analysis: it should only be applied to those synthesized trials with small sample sizes. We call upon that the tools of evidence-based medicine should keep up-to-dated with the cutting-edge technologies in data science. Clinical research data should be made available publicly when there is any relevant article published so the research community could conduct in-depth data mining, which is a better alternative for meta-analysis in many instances.
[1]
Eric R. Ziegel,et al.
Generalized Linear Models
,
2002,
Technometrics.
[2]
P. McCullagh,et al.
Generalized Linear Models
,
1984
.
[3]
D. Altman,et al.
Measuring inconsistency in meta-analyses
,
2003,
BMJ : British Medical Journal.
[4]
Gerta Rücker,et al.
Bmc Medical Research Methodology Open Access Undue Reliance on I 2 in Assessing Heterogeneity May Mislead
,
2022
.
[5]
L. Hedges,et al.
Introduction to Meta‐Analysis
,
2009,
International Coaching Psychology Review.
[6]
Larry V. Hedges,et al.
Fixed‐Effect Versus Random‐Effects Models
,
2009,
Introduction to Meta‐Analysis.
[7]
Robert Kabacoff,et al.
R in Action: Data Analysis and Graphics with R
,
2015
.
[8]
Stephan Morgenthaler,et al.
Meta Analysis: A Guide to Calibrating and Combining Statistical Evidence
,
2008
.