Auditing the UMLS for redundant classifications

The UMLS's Semantic Network (SN) serves as a valuable abstraction for the underlying concept repository called the Metathesaurus (META). Specifically, the SN forms a classification layer for the META, with each of the META's constituent concepts assigned to one or more semantic types in the SN. The rule in the design of the SN is to have concepts explicitly assigned to the lowest possible semantic types in the SN's IS-A hierarchy. Implicit assignment to higher semantic types can be inferred via the IS-A relationships. However, in subsequent versions of the UMLS, unnecessary, simultaneous assignments to descendant and ancestor semantic types have been discovered (e.g., 8,622 in the UMLS 1998 version and 12,657 in the 2001 version). The assignment of concepts to such ancestor semantic types is called redundant classification. There is a need for an automated auditing tool that can identify all these redundant classifications. In this paper, an efficient algorithm for this auditing task is introduced. Details of its application to the current (2001) version of the UMLS are presented and the results are discussed.