Exploring the Efficacy of Generic Drugs in Treating Cancer

Thousands of scientific publications discuss evidence on the efficacy of non-cancer generic drugs being tested for cancer. However, trying to manually identify and extract such evidence is intractable at scale. We introduce a natural language processing pipeline to automate the identification of relevant studies and facilitate the extraction of therapeutic associations between generic drugs and cancers from PubMed abstracts. We annotate datasets of drug-cancer evidence and use them to train models to identify and characterize such evidence at scale. To make this evidence readily consumable, we incorporate the results of the models in a web application that allows users to browse documents and their extracted evidence. Users can provide feedback on the quality of the evidence extracted by our models. This feedback is used to improve our datasets and the corresponding models in a continuous integration system. We describe the natural language processing pipeline in our application and the steps required to deploy services based on the machine learning models. Repurposing Generic Drugs for Cancer Each year nearly 10 million people die from cancer (Cancer Research UK 2020) and the cost of cancer diagnosis and treatment exceeds USD $1 trillion (Union for International Cancer Control 2014). Pharmaceutical research exploring new drugs to treat various cancers is an expensive and time consuming process. In contrast, there are many generic drugs available today that are inexpensive and show promising results in treating different types of cancers. Moreover, there are already several drugs that were successfully repurposed for cancer. For example, Thalidomide, a drug used to treat morning sickness in pregnant women, was proven useful for treating skin lesions and multiple myeloma. Finding new therapeutic uses for inexpensive generic drugs (“drug repurposing”) could rapidly create affordable new treatments. Hundreds of non-cancer generic drugs have shown promise for treating cancer, but it is unclear which drugs to be considered for repurposing. Copyright c © 2021, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. Scientific publications such as pre-clinical laboratory studies and small-scale clinical trials present evidence on generic drugs being used as cancer treatments. The Repurposing Drugs in Oncology (ReDO) project manually inspected articles indexed by PubMed and found anticancer evidence for more than 200 non-cancer generic drugs (Pantziarka et al. 2017; Bouche, Pantziarka, and Meheus 2017; Verbaanderd et al. 2017). However, PubMed indexes millions of articles and the collection is continuously updated. Therefore, manual review to identify and analyze the evidence is time-consuming and intractable at scale. It is imperative to devise (semi)automated techniques to extract and collate the existing evidence. Machine learning (ML)powered evidence synthesis could provide a comprehensive and real-time view of drug repurposing data and enable actionable insights. We have started an ambitious initiative to extract and synthesize the plethora of scientific evidence on generic drugs used for cancer treatment. Our goal is to identify the most promising drugs to repurpose for different kinds of cancer. Identifying drug-cancer evidence from scientific abstracts is not trivial. The articles that discuss cancer interventions use domain-specific jargon which makes the text hard to comprehend by both humans with non-expert background and machines that are not trained with domain-specific data (Lehman et al. 2019). This endeavor requires close collaboration between experts in different disciplines, such as cancer research (to provide guidance, annotate datasets, and verify results), machine learning (to collect and process data sets to be annotated, to devise machine learning models, and evaluate their performance), and software engineering (to deploy and run models as an end-to-end online application). Ultimately, implementing repurposed therapies as the standard of care in medical practice requires definitive clinical trials, new incentives and business models to fund them, and engagement by various stakeholders such as patients, doctors, payers, and policymakers. In this paper, we highlight the key technical aspects of identifying and extracting relevant evidence from PubMed articles and describe the steps required to encapsulate, disThe Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21)