Health Topics Mining in Online Medical Community

With the development of medical informatization, more and more patients actively obtain health information from online medical communities. The traditional methods based on statistical analysis are inefficient in dealing with growing mass of medical texts. Based on the Latent Dirichlet Allocation (LDA), we propose the Medical of Sentence LDA (MS-LDA) for short online medical texts with distribution features of medical words in online medical communities. Disease-related hot topics are assumed to be generated by sentences, the Gaussian function is employed to fit word distribution, and the correlation weight is exploited to modify word frequency for the information extension in sentences. Furthermore, Unified Medical Language System (UMLS) is introduced to cluster the topic recognition results from disease-related hot topics. Experiments on three representative disease boards from www.MedHelp.org show that the perplexity value and word relevance in topics are significantly improved by MS- LDA. Besides, hot topics concerned by members are automatically mined and texts in online medical community are automatically classified.

[1]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[2]  Hao Wu,et al.  Extracting Medical Knowledge from Crowdsourced Question Answering Website , 2020, IEEE Transactions on Big Data.

[3]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[4]  Alice H. Oh,et al.  Aspect and sentiment unification model for online review analysis , 2011, WSDM '11.

[5]  Pengzhu Zhang,et al.  Exploring Health-Related Topics in Online Health Community Using Cluster Analysis , 2013, 2013 46th Hawaii International Conference on System Sciences.

[6]  Honggang Wang,et al.  A Real Time and Non-Contact Multiparameter Wearable Device for Health Monitoring , 2016, 2016 IEEE Global Communications Conference (GLOBECOM).

[7]  Neil S. Coulson,et al.  A thematic analysis of patient communication in Parkinson's disease online support group discussion forums , 2012, Comput. Hum. Behav..

[8]  M. Newman Power laws, Pareto distributions and Zipf's law , 2005 .

[9]  Gianni Amati,et al.  Probability models for information retrieval based on divergence from randomness , 2003 .

[10]  Max Welling,et al.  Distributed Algorithms for Topic Models , 2009, J. Mach. Learn. Res..

[11]  Mangal Sain,et al.  A text mining approach to identify the relationship between gait-Parkinson's disease (PD) from PD based research articles , 2017, 2017 International Conference on Inventive Computing and Informatics (ICICI).

[12]  Georgios Balikas,et al.  Modeling topic dependencies in semantically coherent text spans with copulas , 2016, COLING.

[13]  Annie T. Chen Exploring online support spaces: using cluster analysis to examine breast cancer, diabetes and fibromyalgia support groups. , 2012, Patient education and counseling.

[14]  Jiafeng Guo,et al.  BTM: Topic Modeling over Short Texts , 2014, IEEE Transactions on Knowledge and Data Engineering.

[15]  Zhang Yong,et al.  Contextual-LDA: A Context Coherent Latent Topic Model for Mining Large Corpora , 2016, 2016 IEEE Second International Conference on Multimedia Big Data (BigMM).