Analysis of Error Concentrations in SNOMED

Two high-level abstraction networks for the knowledge content of a terminology, known respectively as the "area taxonomy" and "p-area taxonomy," have previously been defined. Both are derived automatically from partitions of the terminology's concepts. An important application of these networks is in auditing, where a number of systematic regimens have been formulated utilizing them. In particular, the taxonomies tend to highlight certain kinds of concept groups where errors are more likely to be found. Using results garnered from applications of our auditing regimens to SNOMED CT, an investigation into the concentration of errors among such groups is carried out. Three hypotheses pertaining to the error distributions are put forth. The results support the fact that certain groups presented by the taxonomies show higher error percentages as compared to other groups. The bootstrap is used to assess their statistical significance. This knowledge will help direct auditing efforts to increase their impact.