Use of text-mining methods to improve efficiency in the calculation of drug exposure to support pharmacoepidemiology studies

Abstract Background Efficient generation of structured dose instructions that enable researchers to calculate drug exposure is central to pharmacoepidemiology studies. Our aim was to design and test an algorithm to codify dose instructions, applied to the NHS Scotland Prescribing Information System (PIS) that records about 100 million prescriptions per annum. Methods A natural language processing (NLP) algorithm was developed that enabled free-text dose instructions to be represented by three attributes – quantity, frequency and qualifier – specified by three, three and two variables, respectively. A sample of 15 593 distinct dose instructions was used to test, validate and refine the algorithm. The final algorithm used a zero-assumption approach and was then applied to the full dataset. Results The initial algorithm generated structured output for 13 152 (84.34%) of the 15 593 sample dose instructions, and reviewers identified 767 (5.83%) incorrect translations, giving an accuracy of 94.17%. Following subsequent refinement of the algorithm rules, application to the full dataset of 458 227 687 prescriptions (99.67% had dose instructions represented by 4 964 083 distinct instructions) generated a structured output for 92.3% of dose instruction texts. This varied by therapeutic area (from 86.7% for the central nervous system to 96.8% for the cardiovascular system). Conclusions We created an NLP algorithm, operational at scale, to produce structured output that gives data users maximum flexibility to formulate, test and apply their own assumptions according to the medicines under investigation. Text mining approaches can provide a solution to the safe and efficient management and provisioning of large volumes of data generated through our health systems.

[1]  Hui Yang,et al.  Automatic extraction of medication information from medical discharge summaries , 2010, J. Am. Medical Informatics Assoc..

[2]  Goran Nenadic,et al.  Medication information extraction with linguistic pattern matching and semantic rules , 2010, J. Am. Medical Informatics Assoc..

[3]  Goran Nenadic,et al.  Modelling and extraction of variability in free-text medication prescriptions from an anonymised primary care electronic medical record research database , 2015, BMC Medical Informatics and Decision Making.

[4]  Anoop D Shah,et al.  An algorithm to derive a numerical daily dose from unstructured text dosage instructions , 2006, Pharmacoepidemiology and drug safety.

[5]  M. Bennie,et al.  Use of direct oral anticoagulants in patients with atrial fibrillation in Scotland: Applying a coherent framework to drug utilisation studies , 2017, Pharmacoepidemiology and drug safety.

[6]  B. Wettermark The intriguing future of pharmacoepidemiology , 2013, European Journal of Clinical Pharmacology.

[7]  Son Doan,et al.  Application of information technology: MedEx: a medication information extraction system for clinical narratives , 2010, J. Am. Medical Informatics Assoc..

[8]  Marie Schmidt,et al.  HEAL TH AT A GLANCE , 2007 .

[9]  K. Taxis,et al.  Introduction to drug utilization research , 2016 .

[10]  B. Wettermark,et al.  Drug Utilization Research: Methods and Applications , 2016 .

[11]  Lu Gao,et al.  Risk-factors for methadone-specific deaths in Scotland’s methadone-prescription clients between 2009 and 2013* , 2016, Drug and alcohol dependence.

[12]  Marion Bennie,et al.  Data Resource Profile: The Scottish National Prescribing Information System (PIS) , 2016, International journal of epidemiology.

[13]  Ivan Bratko,et al.  Prolog Programming for Artificial Intelligence , 1986 .