Comparative analysis of machine learning based QSAR models and molecular docking studies to screen potential anti-tubercular inhibitors against InhA of mycobacterium tuberculosis

Machine learning techniques are advanced computational techniques which can be used to build the quantitative structure-activity relationship (QSAR) model of compounds dataset to find out important descriptors which are able to predict a specific biological activity from unknown compounds to discover better drugs. In the present study, by optimising descriptors using correlation-based feature selection, principal component analysis, and genetic programming technique, several machine learning techniques were used to build QSAR models on three different experimental datasets of InhA inhibitors. The best QSAR models were deployed on a dataset of 1450 approved drug from drug bank to screen new InhA inhibitors. Amoxicillin was found to show highest predicted activity pIC50 = 6.54, and Itraconazole was the second compound with highest predicted activity 6.4 (pIC50) that was calculated based on the best random forest (RF) model using CFS-GS-FW descriptor set in the dataset of ChEMBL997779 of InhA of Mtb. Additionally, screening by molecular docking identified top-ranked 10 approved drugs as anti-tubercular hits showing G-scores -8.23 to -6.95 (in kcal/mol) as compared with control compounds(known InhA Mtb inhibitors) G-scores -7.86 to -6.68 (in kcal/mol). Thus results indicate these potent compounds may have the better binding affinity for InhA of Mtb. From our studies, we conclude that machine learning based QSAR models can be useful for the development of novel target specific anti-tubercular compounds.