Machine Learning to Improve Treatment Selection for NSCLC Patients Treated with Immunotherapy Using Real World and Translational Data

BackgroundIn advanced Non-Small Cell Lung Cancer (NSCLC), Programmed Death Ligand 1 (PD-L1) remains the only used biomarker to candidate patients to immunotherapy (IO) with many limits. Given the complex dynamics of the immune system it is improbable that a single biomarker could be able to profile prediction with high accuracy. A promising solution cope with this complexity is provided by Artificial Intelligence (AI) and Machine Learning (ML), which are techniques able to analyse and interpret big multifactorial data. The present study aims at using AI tools to improve response and efficacy prediction in NSCLC patients treated with IO. MethodsReal world data (clinical data, PD-L1, histology, molecular, lab tests) and the blood microRNA signature classifier (MSC), which include 24 different microRNAs, were used. Patients were divided into responders (R), who obtained a complete or partial response or stable disease as best response, and non-responders (NR), who experienced progressive or hyperprogressive disease and those who died before the first radiologic evaluation. Moreover, we used the same data to determine if the overall survival of the patients was likely to be shorter or longer than 24 months from baseline IO. For A literature review and forward feature selection technique was used to extract a specific subset of the patients data. To develop the final predictive model, different ML methods have been tested, i.e., Feedforward Neural Network (FFNN), Logistic Regression (LR), K-nearest neighbors (K-NN), Support Vector Machines (SVM), and Random Forest (RF).Results 200 patients were included. 164 out of 200 (i.e., only those patients with PD-L1 data available) were considered in the model, 73 (44.5%) were R and 91 (55.5%) NR. Overall, the best model was the LR and included 5 features: 2 clinical features including the ECOG performance status and IO-line of therapy; 1 tissue feature such as PD-L1 tumour expression; and 2 blood features including the MSC test and the neutrophil-to-lymphocyte ratio (NLR). The model predicting R/NR of the patient achieves accuracy ACC= 0.756, F1 score F1=0.722, and Area Under the ROC Curve AUC=0.82. The use of the PD-L1 alone has an ACC=0.655. The accuracy of the ML models excluding some of the features from the model were as follow: without PD-L1 value (ACC=0.726), MSC (ACC=0.750), and both PD-L1 and MSC (ACC=0.707), i.e., considering only clinical features. At data cut-off (Nov 2020), median Overall Survival (mOS) for R was 38.5 months (m) (95%IC 23.9 - 53.1) vs 3.8 m (95%IC 2.8 - 4.7) for NR, with p<0.001. LR was the most performing model in predicting patients with long survival (24-months OS), achieving ACC=0.839, F1=0.908, and AUC=0.87. ConclusionsThe results suggest that the integration of multifactorial data provided by ML techniques is a useful tool to improve personalized selection of NSCLC patients candidates to IO. In particular, compare to PD-L1 alone the expected improvement was around 10%. In particular, the model shows that the higher the ECOG, NLR value, IO-line, and MSC test level the lower the response, and the higher PD-L1 the higher the response. Considering the difference in survival among R and NR groups, these results suggest that the model can also be used to indirectly predict survival. Moreover, a second model was able to predict long survival patients with good accuracy.