Semantic distance-based creation of clusters of pharmacovigilance terms and their evaluation

BACKGROUND Pharmacovigilance is the activity related to the collection, analysis and prevention of adverse drug reactions (ADRs) induced by drugs or biologics. The detection of adverse drug reactions is performed using statistical algorithms and groupings of ADR terms from the MedDRA (Medical Dictionary for Drug Regulatory Activities) terminology. Standardized MedDRA Queries (SMQs) are the groupings which become a standard for assisting the retrieval and evaluation of MedDRA-coded ADR reports worldwide. Currently 84 SMQs have been created, while several important safety topics are not yet covered. Creation of SMQs is a long and tedious process performed by the experts. It relies on manual analysis of MedDRA in order to find out all the relevant terms to be included in a SMQ. Our objective is to propose an automatic method for assisting the creation of SMQs using the clustering of terms which are semantically similar. METHODS The experimental method relies on a specific semantic resource, and also on the semantic distance algorithms and clustering approaches. We perform several experiments in order to define the optimal parameters. RESULTS Our results show that the proposed method can assist the creation of SMQs and make this process faster and systematic. The average performance of the method is precision 59% and recall 26%. The correlation of the results obtained is 0.72 against the medical doctors judgments and 0.78 against the medical coders judgments. CONCLUSIONS These results and additional evaluation indicate that the generated clusters can be efficiently used for the detection of pharmacovigilance signals, as they provide better signal detection than the existing SMQs.

[1]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[2]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[3]  Reinhard Fescharek,et al.  Medical Dictionary for Regulatory Activities (MedDRA): Data retrieval and presentation , 2004 .

[4]  E. Brown,et al.  The Medical Dictionary for Regulatory Activities (MedDRA) , 1999, Drug safety.

[5]  Christiane Fellbaum,et al.  A Semantic Network of English: The Mother of All WordNets , 1998, Comput. Humanit..

[6]  Yong Yu,et al.  Conceptual Graph Matching for Semantic Search , 2002, ICCS.

[7]  Alain Lelu Modeles neuronaux pour l'analyse de donnees documentaires et textuelles : organiser de très grands tableaux de données qualitatives en pôles et zones d'influence , 1993 .

[8]  Prakash M. Nadkarni,et al.  Determining correspondences between high-frequency MedDRA concepts and SNOMED: a case study , 2010, BMC Medical Informatics Decis. Mak..

[9]  Ted Pedersen,et al.  UMLS-Interface and UMLS-Similarity : Open Source Software for Measuring Paths and Semantic Similarity , 2009, AMIA.

[10]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[11]  Natalia Grabar,et al.  Grouping pharmacovigilance terms with semantic distance , 2011, MIE.

[12]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[13]  Lise Aagaard,et al.  Global patterns of adverse drug reactions over a decade: analyses of spontaneous reports to VigiBase™. , 2012, Drug safety.

[14]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[15]  James J. Cimino,et al.  Towards the development of a conceptual distance metric for the UMLS , 2004, J. Biomed. Informatics.

[16]  Philip Resnik,et al.  Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language , 1999, J. Artif. Intell. Res..

[17]  S. C. Johnson Hierarchical clustering schemes , 1967, Psychometrika.

[18]  Christiane Fellbaum,et al.  Combining Local Context and Wordnet Similarity for Word Sense Identification , 1998 .

[19]  Michael Sussna,et al.  Word sense disambiguation for free-text indexing using a massive semantic network , 1993, CIKM '93.

[20]  Natalia Grabar,et al.  Automatic creation and refinement of the clusters of pharmacovigilance terms , 2012, IHI '12.

[21]  Guillaume Cleuziou,et al.  PoBOC: An Overlapping Clustering Algorithm, Application to Rule-Based Classification and Textual Data , 2004, ECAI.

[22]  Christiane Fellbaum,et al.  Lexical Chains as Representations of Context for the Detection and Correction of Malapropisms , 1998 .

[23]  Carole A. Goble,et al.  Investigating Semantic Similarity Measures Across the Gene Ontology: The Relationship Between Sequence and Annotation , 2003, Bioinform..

[24]  Ebba Holme Hansen,et al.  Global Patterns of Adverse Drug Reactions Over a Decade , 2012, Drug Safety.

[25]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[26]  Marie-Christine Jaulent,et al.  Evaluation of an Ontological Resource for Pharmacovigilance , 2009, MIE.

[27]  Fleur Mougin,et al.  Improving the Mapping between MedDRA and SNOMED CT , 2011, AIME.

[28]  Patrice Degoulet,et al.  Implementation of automated signal generation in pharmacovigilance using a knowledge-based approach , 2005, Int. J. Medical Informatics.

[29]  P. Mozzicato,et al.  Standardised MedDRA Queries , 2007, Drug safety.

[30]  Jürgen Kübler,et al.  Medical Dictionary for Regulatory Activities (MedDRA) , 2012, International Journal of Pharmaceutical Medicine.

[31]  Kent A. Spackman,et al.  SNOMED clinical terms: overview of the development process and project status , 2001, AMIA.

[32]  A Bate,et al.  Decision support methods for the detection of adverse events in post-marketing data. , 2009, Drug discovery today.

[33]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[34]  Ted Pedersen,et al.  Measures of semantic similarity and relatedness in the biomedical domain , 2007, J. Biomed. Informatics.

[35]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[36]  Christel Daniel-Le Bozec,et al.  Computation of semantic similarity within an ontology of breast pathology to assist inter-observer consensus , 2006, Comput. Biol. Medicine.

[37]  A. Bate,et al.  A Bayesian neural network method for adverse drug reaction signal generation , 1998, European Journal of Clinical Pharmacology.

[38]  Guillaume Cleuziou OKM : une extension des k-moyennes pour la recherche de classes recouvrantes , 2007, EGC.

[39]  Peter J. Rousseeuw,et al.  Clustering by means of medoids , 1987 .

[40]  Junho Choi,et al.  An Efficient Computational Method for Measuring Similarity between Two Conceptual Entities , 2003, WAIM.

[41]  Marie-Christine Jaulent,et al.  A case report: using SNOMED CT for grouping Adverse Drug Reactions Terms , 2008, BMC Medical Informatics Decis. Mak..

[42]  Stan Szpakowicz,et al.  Roget's thesaurus and semantic similarity , 2012, RANLP.

[43]  Gary H. Merrill,et al.  The MedDRA Paradox , 2008, AMIA.

[44]  David M. W. Powers,et al.  Measuring Semantic Similarity in the Taxonomy of WordNet , 2005, ACSC.

[45]  Patrice Degoulet,et al.  Clustering WHO-ART Terms Using Semantic Distance and Machine Learning Algorithms , 2006, AMIA.

[46]  Natalia Grabar,et al.  Customization of biomedical terminologies , 2012, MIE.

[47]  M. Lindquist,et al.  Signal Selection and Follow-Up in Pharmacovigilance , 2002, Drug safety.

[48]  Roy Rada,et al.  Development and application of a metric on semantic nets , 1989, IEEE Trans. Syst. Man Cybern..

[49]  David Madigan,et al.  Influence of the MedDRA® hierarchy on pharmacovigilance data mining results , 2009, Int. J. Medical Informatics.

[50]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[51]  Anupam Joshi,et al.  Low-complexity fuzzy relational clustering algorithms for Web mining , 2001, IEEE Trans. Fuzzy Syst..

[52]  Ted Pedersen,et al.  Extended Gloss Overlaps as a Measure of Semantic Relatedness , 2003, IJCAI.

[53]  Natalia Grabar,et al.  Semantic distance and terminology structuring methods for the detection of semantically close terms , 2012, BioNLP@HLT-NAACL.

[54]  José L. V. Mejino,et al.  A reference ontology for biomedical informatics: the Foundational Model of Anatomy , 2003, J. Biomed. Informatics.

[55]  Olivier Bodenreider Using SNOMED CT in combination with MedDRA for reporting signal detection and adverse drug reactions reporting , 2009, AMIA.

[56]  Underreporting in pharmacovigilance: an intervention for Italian GPs (Emilia–Romagna region) , 2013, European Journal of Clinical Pharmacology.

[57]  Marie-Christine Jaulent,et al.  Automatic Generation of MedDRA terms Groupings using an Ontology , 2012, MIE.