Predictive modeling of anti-malarial molecules inhibiting apicoplast formation

BackgroundMalaria is a major healthcare problem worldwide resulting in an estimated 0.65 million deaths every year. It is caused by the members of the parasite genus Plasmodium. The current therapeutic options for malaria are limited to a few classes of molecules, and are fast shrinking due to the emergence of widespread resistance to drugs in the pathogen. The recent availability of high-throughput phenotypic screen datasets for antimalarial activity offers a possibility to create computational models for bioactivity based on chemical descriptors of molecules with potential to accelerate drug discovery for malaria.ResultsIn the present study, we have used high-throughput screen datasets for the discovery of apicoplast inhibitors of the malarial pathogen as assayed from the delayed death response. We employed machine learning approach and developed computational predictive models to predict the biological activity of new antimalarial compounds. The molecules were further evaluated for common substructures using a Maximum Common Substructure (MCS) based approach.ConclusionsWe created computational models using state-of-the-art machine learning algorithms. The models were evaluated based on multiple statistical criteria. We found Random Forest based approach provides for better accuracy as assessed from ROC curve analysis. We further evaluated the active molecules using a substructure based approach to identify common substructures enriched in the active set. We argue that the computational models generated could be effectively used to screen large molecular datasets to prioritize them for phenotypic screens, drastically reducing cost while improving the hit rate.

[1]  C. Newton,et al.  Pathophysiology of fatal falciparum malaria in African children. , 1998, The American journal of tropical medicine and hygiene.

[2]  Pedro M. Domingos MetaCost: a general method for making classifiers cost-sensitive , 1999, KDD '99.

[3]  Amanda C. Schierz Virtual screening of bioassay data , 2009, J. Cheminformatics.

[4]  Jonathan D Hirst,et al.  Machine learning in virtual screening. , 2009, Combinatorial chemistry & high throughput screening.

[5]  Vas Dev,et al.  Multi-drug resistant Plasmodium falciparum malaria in Assam, India: timing of recurrence and anti-malarial drug concentrations in whole blood. , 2003, The American journal of tropical medicine and hygiene.

[6]  S. Mehta,et al.  Management of malaria: recent trends. , 2006, The Journal of communicable diseases.

[7]  Anne Mills,et al.  Conquering the intolerable burden of malaria: what's new, what's needed: a summary. , 2004, The American journal of tropical medicine and hygiene.

[8]  Charles Elkan,et al.  The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[9]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[10]  M. Moran,et al.  The malaria product pipeline: planning for the future , 2007 .

[11]  G H Trenholme,et al.  Therapy and prophylaxis of malaria. , 1978, JAMA.

[12]  C Kidson,et al.  Ecology, economics and political will: the vicissitudes of malaria strategies in Asia. , 1998, Parassitologia.

[13]  C. Dolea,et al.  World Health Organization , 1949, International Organization.

[14]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[15]  Jun Feng,et al.  PowerMV: A Software Environment for Molecular Viewing, Descriptor Generation, Data Analysis and Hit Evaluation , 2005, J. Chem. Inf. Model..

[16]  BMC Bioinformatics , 2005 .

[17]  Andrew J Tatem,et al.  The global distribution and population at risk of malaria: past, present, and future. , 2004, The Lancet. Infectious diseases.

[18]  H. Webster,et al.  Emergence of multidrug-resistant Plasmodium falciparum in Thailand: in vitro tracking. , 1992, The American journal of tropical medicine and hygiene.

[19]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[20]  Yanli Wang,et al.  PubChem: a public information system for analyzing bioactivities of small molecules , 2009, Nucleic Acids Res..

[21]  A. Kroeger,et al.  The socioeconomic impact of malaria in Colombia and Ecuador. , 1994, Health policy and planning.

[22]  World malaria situation 1990. Division of Control of Tropical Diseases. World Health Organization, Geneva. , 1992, World health statistics quarterly. Rapport trimestriel de statistiques sanitaires mondiales.

[23]  Qi Fan,et al.  Multidrug-Resistant Genotypes of Plasmodium falciparum, Myanmar , 2011, Emerging infectious diseases.

[24]  Vinod Scaria,et al.  Computational models for in-vitro anti-tubercular activity of molecules based on high-throughput chemical biology screening datasets , 2012, BMC pharmacology.

[25]  Ian H. Witten,et al.  WEKA - Experiences with a Java Open-Source Project , 2010, J. Mach. Learn. Res..

[26]  P. Phillips-Howard,et al.  The epidemiology of drug-resistant malaria. , 1990, Transactions of the Royal Society of Tropical Medicine and Hygiene.

[27]  Vinod Scaria,et al.  Predictive models for anti-tubercular molecules using machine learning on high-throughput biological screening datasets , 2011, BMC Research Notes.

[28]  et al.,et al.  Computational analysis and predictive modeling of small molecule modulators of microRNA , 2012, Journal of Cheminformatics.