Mining officially unrecognized side effects of drugs by combining web search and machine learning

We consider the problem of finding officially unrecognized side effects of drugs. By submitting queries to the Web involving a given drug name, it is possible to retrieve pages concerning the drug. However, many retrieved pages are irrelevant and some relevant pages are not retrieved. More relevant pages can be obtained by adding the active ingredient of the drug to the query. In order to eliminate irrelevant pages, we propose a machine learning process to filter out the undesirable pages. The process is shown experimentally to be very effective. Since obtaining training data for the machine learning process can be time consuming and expensive, we provide an automatic method to generate the training data. The method is also shown to be very accurate. The side effects of three drugs which are not recognized by FDA are validated by an expert. We believe that the same approach can be applied to many real life problems and will yield high precision. Thus, this could lead a new way to perform retrieval with high accuracy.

[1]  Mph Dr. Syed Rizwanuddin Ahmad MD Adverse drug event monitoring at the food and drug administration , 2007, Journal of General Internal Medicine.

[2]  Jennifer Couzin,et al.  Withdrawal of Vioxx Casts a Shadow Over COX-2 Inhibitors , 2004, Science.

[3]  Clement T. Yu,et al.  An effective approach to document retrieval via utilizing WordNet and recognizing phrases , 2004, SIGIR '04.

[4]  Oren Kurland,et al.  Corpus structure, language models, and ad hoc information retrieval , 2004, SIGIR '04.

[5]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[6]  J. Couzin Drug safety. Withdrawal of Vioxx casts a shadow over COX-2 inhibitors. , 2004, Science.

[7]  Teuvo Kohonen,et al.  Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.

[8]  Kui-Lam Kwok,et al.  TREC 2003 Robust, HARD and QA Track Experiments using PIRCS , 2003, TREC.

[9]  Ellen M. Voorhees,et al.  Overview of the TREC 2002 Question Answering Track , 2003, TREC.

[10]  S. Wolfe,et al.  Timing of new black box warnings and withdrawals for prescription medications. , 2002, JAMA.

[11]  M. Lindquist,et al.  A comparison of measures of disproportionality for signal detection in spontaneous reporting systems for adverse drug reactions , 2002, Pharmacoepidemiology and drug safety.

[12]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[13]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[14]  Daniel Graupe,et al.  A Large Memory Storage and Retrieval Neural Network for Adaptive Retrieval and Diagnosis , 1998, Int. J. Softw. Eng. Knowl. Eng..

[15]  David W. Bates,et al.  Computerized Data Mining for Adverse Drug Events in an Outpatient Setting , 1998, AMIA.

[16]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[17]  James P. Callan,et al.  Training algorithms for linear text classifiers , 1996, SIGIR '96.

[18]  Clement T. Yu,et al.  Two learning schemes in information retrieval , 1988, SIGIR '88.

[19]  Gerald Salton,et al.  Automatic text processing , 1988 .

[20]  Erwin K. Kastrup,et al.  Drug facts and comparisons , 1977 .

[21]  W. M. Heller American Hospital Formulary Service , 1959 .

[22]  D. Moravec The American Hospital Formulary Service. , 1958, Hospital management.