Automatic alignment of medical vs. general terminologies

We propose an original automatic alignment of deflnitions taken from difierent dictionaries that could be associated to the same con- cept although they may have difierent labels. The alignment between a specialized terminology used by the librarians to index concepts and a general vocabulary employed by a neophyte user in order to retrieve documents on Internet, will certainly improve the performances of the information retrieval process. The selected framework is a medical one. We propose a terminology alignment by an SVM classifler trained on a compact, but relevant representation of such deflnition pair by several similarity measures and the length of deflnitions. Three syntactic levels are investigated: Nouns, Nouns-Adjectives, and Nouns-Adjectives-Verbs. Our aim is to show how the combination of similarity measures ofiers a better semantic access to the document content than only one measure and it improves the performances of the automatic alignment. The re- sults obtained on the test set show the relevance of our approach, as the F-measure reaches 88%. However, this conclusion should be validated on larger corpora. One of the most important characteristic of an information retrieval system is related to its capability to answer queries of both neophyte and expert users. The expert user queries are formulated in a specialized language, which is gener- ally the language used to index the documents for a very particular domain, as the health area. The problem is that the neophyte users formulate their queries with a naive language, while the documents are indexed through the concepts of specialised terminologies. Therefore, it becomes necessary to automatically align several specialised terminologies with the vocabulary shared by an average user for information retrieval on Internet. These alignments will allow the informa- tion retrieval systems for a better exploitation of specialised terminologies and electronic dictionaries in order to beneflt from the advantages of their strengths. Our aim is to achieve an accurate automatic alignment of medical deflnitions in French taken from several specialised terminologies with those from general dictionaries. This alignment is a di-cult task since these deflnitions may have difierent labels, although they are related to the same medical concept. There- fore, we have chosen to represent the specialised terminology by several concepts taken from the thesaurus Medical Subject Headings (MeSH) and the VIDAL dictionary, while the medical general vocabulary is represented by several defl- nitions from the encyclopaedia Wikipedia and from the semantic network of Le