Desiderata for Drug Classification Systems for their Use in Analyzing Large Drug Prescription Datasets

Background: Information about the billions of prescriptions written for patients each year is collected in large datasets. Drug classification systems (DCSs) are key to analyzing these datasets. However, their ability to support such analyses has not been studied. Methods: We identified six desirable features for drug classification systems (DCSs) from the perspective of analyzing large drug prescription datasets. In addition to offering operational definitions for these desiderata, we also applied them to clinically significant drugs in RxNorm for six DCSs, and used them to assess the impact of these DCSs on the analysis of a large drug prescription dataset from Medicare Part D claims. Results: Based on these desiderata, we could determine that ATC, VAC and EPC seem to be better suited for the analysis of large drug prescription datasets, because of their coverage and granularity, and because ATC and VAC support aggregation. Introduction and Background Prescription drugs accounted for $297.7 billion in the U.S. in 2014, or 9.8% of the national health expenditures in that year1. Approximately 3 in 5 American adults affirm to be currently taking at least one prescription medication, a proportion that continuously increased from 1999 to 20122, and cost $858 per capita in 20133. In year 2015, over four billion drug prescriptions were filled at U.S. pharmacies4. Drug therapy is one of the pillars of U.S. health care, and large drug prescription datasets can potentially support clinical, public health, and health policy analyses. It is often convenient to refer to drugs not as individual ingredients or products, but rather as sets of drugs that share particular characteristics, i.e. drug classes. Medications can be grouped according to different perspectives, and different drug classification systems (DCSs) have been developed for various use cases. However, it might not be easy for researchers to select a given DCS for their study. In this investigation, we review some of the characteristics of six publicly available DCSs and outline desiderata for their use in analyzing large drug prescription datasets. The DCSs under investigation are the Anatomical Therapeutic Chemical (ATC) classification system5 developed by the World Health Organization (WHO) Collaborating Centre for Drug Statistics Methodology; the Established Pharmacological Classes (EPC) from the U.S. Food and Drug Administration (FDA); and four DCSs from the U.S. Veterans Health Administration’ (VHA) National Drug File Reference Terminology (NDF-RT)6, namely Mechanism of Action (MoA), Physiological Effect (PE), Chemical Ingredient (Chem), and the Veteran Affairs’ Drug Classes (VAC). Prior research has investigated the desirable characteristics of medical terminologies (e.g. Cimino, 19987). Some desiderata identified for medical terminologies are particularly relevant to DCSs, including coverage, the existence of a hierarchical structure, granularity, and non-ambiguity. The specific contribution of our work is to apply Cimino’s desiderata specifically to DCSs and extend them where appropriate. Moreover, we provide operational definitions for these desiderata, apply them to clinically significant drugs in RxNorm for six DCSs, and use our desiderata to assess the impact of these DCSs on the analysis of 1 billion Medicare Part D claims for 4.8 million Medicare beneficiaries over nine years. Materials We heavily leverage the RxNorm and RxClass application programming interfaces (APIs)8, developed by the U.S. National Library of Medicine, for selecting drugs of interest and associating them with their corresponding classes. Providing a detailed explanation of how we create these associations is beyond the scope of this paper, which focuses on the desirable characteristics of DCSs. Salient features are briefly mentioned. Of the six DCSs, only VAC associates drug products with classes. The other DCSs link ingredients to classes. We use RxNorm to link drug products to their ingredients, including for multi-ingredient drugs, so that they can be further linked to classes. For ATC, we also take Proceedings of the 3rd Workshop on Data Mining for Medical Informatics (DMMI 2016)