Multivariate analysis of the population representativeness of related clinical studies

OBJECTIVE To develop a multivariate method for quantifying the population representativeness across related clinical studies and a computational method for identifying and characterizing underrepresented subgroups in clinical studies. METHODS We extended a published metric named Generalizability Index for Study Traits (GIST) to include multiple study traits for quantifying the population representativeness of a set of related studies by assuming the independence and equal importance among all study traits. On this basis, we compared the effectiveness of GIST and multivariate GIST (mGIST) qualitatively. We further developed an algorithm called "Multivariate Underrepresented Subgroup Identification" (MAGIC) for constructing optimal combinations of distinct value intervals of multiple traits to define underrepresented subgroups in a set of related studies. Using Type 2 diabetes mellitus (T2DM) as an example, we identified and extracted frequently used quantitative eligibility criteria variables in a set of clinical studies. We profiled the T2DM target population using the National Health and Nutrition Examination Survey (NHANES) data. RESULTS According to the mGIST scores for four example variables, i.e., age, HbA1c, BMI, and gender, the included observational T2DM studies had superior population representativeness than the interventional T2DM studies. For the interventional T2DM studies, Phase I trials had better population representativeness than Phase III trials. People at least 65years old with HbA1c value between 5.7% and 7.2% were particularly underrepresented in the included T2DM trials. These results confirmed well-known knowledge and demonstrated the effectiveness of our methods in population representativeness assessment. CONCLUSIONS mGIST is effective at quantifying population representativeness of related clinical studies using multiple numeric study traits. MAGIC identifies underrepresented subgroups in clinical studies. Both data-driven methods can be used to improve the transparency of design bias in participation selection at the research community level.

[1]  Johannes J M van Delden,et al.  Justification of exclusion criteria was underreported in a review of cardiovascular trials. , 2014, Journal of clinical epidemiology.

[2]  H. Weisberg,et al.  Selection criteria and generalizability within the counterfactual framework: explaining the paradox of antidepressant-induced suicidality? , 2009, Clinical trials.

[3]  Cornelis J H van de Velde,et al.  External validity of a trial comprised of elderly patients with hormone receptor-positive breast cancer. , 2014, Journal of the National Cancer Institute.

[4]  Chunhua Weng,et al.  Visual aggregate analysis of eligibility features of clinical trials , 2015, J. Biomed. Informatics.

[5]  Elizabeth L. Ogburn,et al.  Generalizability of clinical trial results for major depression to community samples: results from the National Epidemiologic Survey on Alcohol and Related Conditions. , 2008, The Journal of clinical psychiatry.

[6]  Michael M. Engelgau,et al.  Prevalence of Diabetes and Impaired Fasting Glucose in Adults in the U.S. Population , 2006, Diabetes Care.

[7]  C. Coltman,et al.  Underrepresentation of patients 65 years of age or older in cancer-treatment trials. , 1999, The New England journal of medicine.

[8]  Huan Liu,et al.  Discretization: An Enabling Technique , 2002, Data Mining and Knowledge Discovery.

[9]  D. Wysowski,et al.  Adverse drug event surveillance and drug withdrawals in the United States, 1969-2002: the importance of reporting suspected reactions. , 2005, Archives of internal medicine.

[10]  Allison Hedley Dodd,et al.  Treatment approach and HbA1c control among US adults with type 2 diabetes: NHANES 1999–2004* , 2009, Current medical research and opinion.

[11]  Yann Le Strat,et al.  Generalizability of clinical trial results for bipolar disorder to community samples: findings from the National Epidemiologic Survey on Alcohol and Related Conditions. , 2013, The Journal of clinical psychiatry.

[12]  Nicholas I. Fisher,et al.  Bump hunting in high-dimensional data , 1999, Stat. Comput..

[13]  Catherine Doyle,et al.  Eligibility criteria in randomized phase II and III adjuvant and neoadjuvant breast cancer trials: Not a significant barrier to enrollment , 2012, Clinical trials.

[14]  Patrick B. Ryan,et al.  Simulation-based Evaluation of the Generalizability Index for Study Traits , 2015, AMIA.

[15]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[16]  Tianyong Hao,et al.  Clustering clinical trials with similar eligibility criteria features , 2014, J. Biomed. Informatics.

[17]  Maria I. Rodriguez,et al.  The safety, efficacy and acceptability of task sharing tubal sterilization to midlevel providers: a systematic review. , 2014, Contraception.

[18]  Mark Zimmerman,et al.  Generalizability of antidepressant efficacy trials: differences between depressed psychiatric outpatients who would or would not qualify for an efficacy trial. , 2005, The American journal of psychiatry.

[19]  Joy H. Lewis,et al.  Participation of patients 65 years of age or older in cancer clinical trials. , 2003, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[20]  Tianyong Hao,et al.  A Method for Analyzing Commonalities in Clinical Trial Target Populations , 2014, AMIA.

[21]  Yuxia Wu,et al.  The External Validity of Randomized Controlled Trials of Hypertension within China: from the Perspective of Sample Representation , 2013, PloS one.

[22]  Chunhua Weng,et al.  Formal representation of eligibility criteria: A literature review , 2010, J. Biomed. Informatics.

[23]  P. Rothwell,et al.  External validity of randomised controlled trials: “To whom do the results of this trial apply?” , 2005, The Lancet.

[24]  Jeffrey B Halter,et al.  Diabetes and cardiovascular disease prevention in older adults. , 2009, Clinics in geriatric medicine.

[25]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[26]  G Hripcsak,et al.  A Distribution-based Method for Assessing The Differences between Clinical Trial Target Populations and Patient Populations in Electronic Health Records , 2014, Applied Clinical Informatics.

[27]  Sirpa Hartikainen,et al.  Systematic Review: Representativeness of Participants in RCTs of Acetylcholinesterase Inhibitors , 2015, PloS one.

[28]  W. A. Gool,et al.  The age gap between patients in clinical studies and in the general population: a pitfall for dementia research , 2004, The Lancet Neurology.

[29]  Mohit Bhandari,et al.  Lack of diversity in orthopaedic trials conducted in the United States. , 2014, The Journal of bone and joint surgery. American volume.

[30]  Madjid Khalilian,et al.  K-Means Divide and Conquer Clustering , 2009, 2009 International Conference on Computer and Automation Engineering.

[31]  Shuang Wang,et al.  Assessing the Collective Population Representativeness of Related Type 2 Diabetes Trials by Combining Public Data from ClinicalTrials.gov and NHANES , 2015, MedInfo.

[32]  Lina Balluz,et al.  When data are not missing at random: implications for measuring health conditions in the Behavioral Risk Factor Surveillance System , 2012, BMJ Open.

[33]  K. Alexander,et al.  Representation of elderly persons and women in published randomized trials of acute coronary syndromes. , 2001, JAMA.

[34]  Evert de Jonge,et al.  Applying PRIM (Patient Rule Induction Method) and logistic regression for selecting high-risk subgroups in very elderly ICU patients , 2008, Int. J. Medical Informatics.

[35]  E. Beers,et al.  Participation of Older People in Preauthorization Trials of Recently Approved Medicines , 2014, Journal of the American Geriatrics Society.