A Machine Learning Approach for High-Dimensional Time-to-Event Prediction With Application to Immunogenicity of Biotherapies in the ABIRISK Cohort

Predicting immunogenicity for biotherapies using patient and drug-related factors represents nowadays a challenging issue. With the growing ability to collect massive amount of data, machine learning algorithms can provide efficient predictive tools. From the bio-clinical data collected in the multi-cohort of autoimmune diseases treated with biotherapies from the ABIRISK consortium, we evaluated the predictive power of a custom-built random survival forest for predicting the occurrence of anti-drug antibodies. This procedure takes into account the existence of a population composed of immune-reactive and immune-tolerant subjects as well as the existence of a tiny expected proportion of relevant predictive variables. The practical application to the ABIRISK cohort shows that this approach provides a good predictive accuracy that outperforms the classical survival random forest procedure. Moreover, the individual predicted probabilities allow to separate high and low risk group of patients. To our best knowledge, this is the first study to evaluate the use of machine learning procedures to predict biotherapy immunogenicity based on bioclinical information. It seems that such approach may have potential to provide useful information for the clinical practice of stratifying patients before receiving a biotherapy.

[1]  Carlos Pineda,et al.  Assessing the Immunogenicity of Biopharmaceuticals , 2016, BioDrugs.

[2]  B Asselain,et al.  A Semiparametric Approach for the Two‐Sample Comparison of Survival Times with Long‐Term Survivors , 2001, Biometrics.

[3]  A Yu Yakovlev,et al.  Stochastic Models of Tumor Latency and Their Biostatistical Applications , 1996 .

[4]  Saso Dzeroski,et al.  Combining Bagging and Random Subspaces to Create Better Ensembles , 2007, IDA.

[5]  Katharina Burger,et al.  Counting Processes And Survival Analysis , 2016 .

[6]  Gang Li,et al.  A Selective Review on Random Survival Forests for High Dimensional Data. , 2017, Quantitative bio-science.

[7]  Heping Zhang,et al.  Recursive Partitioning and Applications , 1999 .

[8]  J G Ibrahim,et al.  Estimating Cure Rates From Survival Data , 2003, Journal of the American Statistical Association.

[9]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[10]  Ashutosh Kumar Singh,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2010 .

[11]  R. Maller,et al.  Testing for the presence of immune or cured individuals in censored survival data. , 1995, Biometrics.

[12]  Alan M. Kwong,et al.  Next-generation genotype imputation service and methods , 2016, Nature Genetics.

[13]  Huub Schellekens,et al.  Immunogenicity of biopharmaceuticals. , 2006, Nephrology, dialysis, transplantation : official publication of the European Dialysis and Transplant Association - European Renal Association.

[14]  Denis Larocque,et al.  A review of survival trees , 2011 .

[15]  F. Harrell,et al.  Evaluating the yield of medical tests. , 1982, JAMA.