Medical vocabulary mining using distributional semantics on Japanese patient blogs

Random indexing has previously been successfully used for medical vocabulary expansion for Germanic languages. In this study, we used this approach to ex- tract medical terms from a Japanese pa- tient blog corpus. The corpus was seg- mented into semantic units by a semantic role labeller, and different pre-processing and parameter settings were then evalu- ated. The evaluation showed that simi- lar settings are suitable for Japanese as for previously explored Germanic languages, and that distributional semantics is equally useful for semi-automatic expansion of Japanese medical vocabularies as for med- ical vocabularies in Germanic languages.

[1]  Anders Holst,et al.  Random indexing of text samples for latent semantic analysis , 2000 .

[2]  Masaki Murata,et al.  A Bayesian Method for Robust Estimation of Distributional Similarities , 2010, ACL.

[3]  Michiel Kamermans An introduction to Japanese - Syntax, Grammar & Language , 2010 .

[4]  Mike Conway,et al.  Identifying Synonymy between SNOMED Clinical Terms of Varying Length Using Distributional Analysis of Electronic Health Records , 2013, AMIA.

[5]  P. Kanerva,et al.  Permutations as a means to encode order in word space , 2008 .

[6]  Maria Skeppstedt,et al.  Vocabulary Expansion by Semantic Extraction of Medical Terms , 2013 .

[7]  Siddhartha Jonnalagadda,et al.  Enhancing clinical concept extraction with distributional semantics , 2012, J. Biomed. Informatics.

[8]  Natsuko Tsujimura,et al.  The handbook of Japanese linguistics , 2001 .

[9]  Sayo Yotsukura AN INTRODUCTION TO JAPANESE SYNTAX. , 1968 .

[10]  Masaki Murata,et al.  Hypernym Discovery Based on Distributional Similarity and Hierarchical Structures , 2009, EMNLP.

[11]  Sampo Pyysalo,et al.  Size (and Domain) Matters: Evaluating Semantic Word Space Representations for Biomedical Text , 2012 .

[12]  Kenji Araki,et al.  Performance Improvement of Drug Effects Extraction System from Japanese Blogs , 2013, 2013 IEEE Seventh International Conference on Semantic Computing.

[13]  Preben Hansen,et al.  English-Japanese Cross-lingual Query Expansion Using Random Indexing of Aligned Bilingual Text Data , 2002, NTCIR.

[14]  Tapio Salakoski,et al.  Distributional Semantics Resources for Biomedical Text Processing , 2013 .

[15]  Maria Skeppstedt,et al.  Synonym extraction and abbreviation expansion with ensembles of semantic spaces , 2014, Journal of Biomedical Semantics.

[16]  Magnus Sahlgren,et al.  Terminology mining in social media , 2009, CIKM.