论文信息 - Mapping Unparalleled Clinical Professional and Consumer Languages with Embedding Alignment

Mapping Unparalleled Clinical Professional and Consumer Languages with Embedding Alignment

Mapping and translating professional but arcane clinical jargons to consumer language is essential to improve the patient-clinician communication. Researchers have used the existing biomedical ontologies and consumer health vocabulary dictionary to translate between the languages. However, such approaches are limited by expert efforts to manually build the dictionary, which is hard to be generalized and scalable. In this work, we utilized the embeddings alignment method for the word mapping between unparalleled clinical professional and consumer language embeddings. To map semantically similar words in two different word embeddings, we first independently trained word embeddings on both the corpus with abundant clinical professional terms and the other with mainly healthcare consumer terms. Then, we aligned the embeddings by the Procrustes algorithm. We also investigated the approach with the adversarial training with refinement. We evaluated the quality of the alignment through the similar words retrieval both by computing the model precision and as well as judging qualitatively by human. We show that the Procrustes algorithm can be performant for the professional consumer language embeddings alignment, whereas adversarial training with refinement may find some relations between two languages.

Peter Szolovits | Wei-Hung Weng

[1] Dong Wang,et al. Normalized Word Embedding and Orthogonal Transform for Bilingual Word Translation , 2015, NAACL.

[2] A. Barratt,et al. Words do matter: a systematic review on how different terminology for the same condition influences management preferences , 2017, BMJ Open.

[3] Sampo Pyysalo,et al. How to Train good Word Embeddings for Biomedical NLP , 2016, BioNLP@ACL.

[4] J. Jansen,et al. Influence of the disease label ‘polycystic ovary syndrome’ on intention to have an ultrasound and psychosocial outcomes: a randomised online study in young women , 2017, Human reproduction.

[5] Martin F. Porter,et al. An algorithm for suffix stripping , 1997, Program.

[6] Mihai Surdeanu,et al. The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[7] Steven Bird,et al. NLTK: The Natural Language Toolkit , 2002, ACL.

[8] 拓海杉山,et al. “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[9] Guillaume Lample,et al. Unsupervised Machine Translation Using Monolingual Corpora Only , 2017, ICLR.

[10] Kai Zheng,et al. Mining Consumer Health Vocabulary from Community-Generated Text , 2014, AMIA.

[11] Franck Dernoncourt,et al. Sequential Short-Text Classification with Recurrent and Convolutional Neural Networks , 2016, NAACL.