Exploring Possible Adverse Drug Reactions by Clustering Event Sequences

Historically the identification of adverse drug reactions relies on manual processes whereby doctors and hospitals report incidences to a central agency. In this paper we suggest a data mining approach using administrative pharmaceutical usage data linked with hospital admissions data. Patients, represented by temporal sequences of drug usage, are clustered using unsupervised learning techniques. Such techniques rely on a distance measure, and we propose in this paper such a distance measure for comparing drug usage sequences based on an event-type hierarchy, based around the hierarchical drug classification system. Although developed for a specific domain, we indicate that it is applicable in other applications involving data where event types form a hierarchical structure, such as is found in telecommunications applications. The approach modifies the Uniform Kernel K-Nearest Neighbour Clustering algorithm to constrain the merging of clusters to those clusters within a specified distance. The approach avoids losing clusters that are less dense yet far apart, as would occur without such a modification, but is typical of the types of applications we are interested in (where outliers are important). We demonstrate the algorithm through a successful application exploring for possible adverse drug events, in particular exploring hospital admissions for severe angioedema resulting from the usage of certain drugs and drug combinations. The interesting clusters thus identified have given clues to medical researchers for further investigations.

[1]  Jiong Yang,et al.  CLUSEQ: efficient and effective sequence clustering , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[2]  Hongxing He,et al.  Feature Selection for Temporal Health Records , 2001, PAKDD.

[3]  Valerie Guralnik,et al.  A scalable algorithm for clustering sequential data , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[4]  Douglas H. Fisher,et al.  Knowledge Acquisition Via Incremental Conceptual Clustering , 1987, Machine Learning.

[5]  Tadeusz Morzy,et al.  Scalable Hierarchical Clustering Method for Sequences of Categorical Values , 2001, PAKDD.

[6]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[7]  Jian Pei,et al.  ApproxMAP: Approximate Mining of Consensus Sequential Patterns , 2003, SDM.

[8]  Heikki Mannila,et al.  Similarity of event sequences , 1997, Proceedings of TIME '97: 4th International Workshop on Temporal Representation and Reasoning.

[9]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[10]  Pirjo Moen,et al.  Attribute, Event Sequence, and Event Type Similarity Notions for Data Mining , 2000 .