A hotspots analysis-relation discovery representation model for revealing diabetes mellitus and obesity

BackgroundNowadays, because of the huge economic burden on society causing by obesity and diabetes, they turn into the most serious public health challenges in the world. To reveal the close and complex relationships between diabetes, obesity and other diseases, search the effective treatment for them, a novel model named as representative latent Dirichlet allocation (RLDA) topic model is presented.ResultsRLDA was applied to a corpus of more than 337,000 literatures of diabetes and obesity which were published from 2007 to 2016. To unveil those meaningful relationships between diabetes mellitus, obesity and other diseases, we performed an explicit analysis on the output of our model with a series of visualization tools. Then, with the clinical reports which were not used in the training data to show the credibility of our discoveries, we find that a sufficient number of these records are matched directly. Our results illustrate that in the last 10 years, for obesity accompanying diseases, scientists and researchers mainly focus on 17 of them, such as asthma, gastric disease, heart disease and so on; for the study of diabetes mellitus, it features a more broad scope of 26 diseases, such as Alzheimer’s disease, heart disease and so forth; for both of them, there are 15 accompanying diseases, listed as following: adrenal disease, anxiety, cardiovascular disease, depression, heart disease, hepatitis, hypertension, hypothalamic disease, respiratory disease, myocardial infarction, OSAS, liver disease, lung disease, schizophrenia, tuberculosis. In addition, tumor necrosis factor, tumor, adolescent obesity or diabetes, inflammation, hypertension and cell are going be the hot topics related to diabetes mellitus and obesity in the next few years.ConclusionsWith the help of RLDA, the hotspots analysis-relation discovery results on diabetes and obesity were achieved. We extracted the significant relationships between them and other diseases such as Alzheimer’s disease, heart disease and tumor. It is believed that the new proposed representation learning algorithm can help biomedical researchers better focus their attention and optimize their research direction.

[1]  Alan D. Lopez,et al.  Global, regional, and national prevalence of overweight and obesity in children and adults during 1980–2013: a systematic analysis for the Global Burden of Disease Study 2013 , 2014, The Lancet.

[2]  Xiaoyan Zhu,et al.  Sentiment Analysis with Global Topics and Local Dependency , 2010, AAAI.

[3]  N. Clark,et al.  Standards of Medical Care in Diabetes: Response to Power , 2006 .

[4]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[5]  D. Mccormick Sequence the Human Genome , 1986, Bio/Technology.

[6]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[7]  Eric Brill,et al.  A Simple Rule-Based Part of Speech Tagger , 1992, HLT.

[8]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[9]  John D. Lafferty,et al.  A correlated topic model of Science , 2007, 0708.3601.

[10]  D. Chu,et al.  Obesity/overweight reduces the risk of active tuberculosis: a nationwide population-based cohort study in Taiwan , 2017, International Journal of Obesity.

[11]  Wei Zhang,et al.  The incidence of co-morbidities related to obesity and overweight: A systematic review and meta-analysis , 2009, BMC public health.

[12]  Sophia Ananiadou,et al.  Text mining and its potential applications in systems biology. , 2006, Trends in biotechnology.

[13]  M. Jarvelin,et al.  A Common Variant in the FTO Gene Is Associated with Body Mass Index and Predisposes to Childhood and Adult Obesity , 2007, Science.

[14]  Maurizio Marchese,et al.  Text Clustering with Seeds Affinity Propagation , 2011, IEEE Transactions on Knowledge and Data Engineering.

[15]  A. Krikorian,et al.  Standards of medical care in diabetes--2006. , 2006, Diabetes care.

[16]  D. Rebholz-Schuhmann,et al.  Text-mining solutions for biomedical research: enabling integrative biology , 2012, Nature Reviews Genetics.

[17]  Alan D. Lopez,et al.  The Global Burden of Disease Study , 2003 .

[18]  Xu Wang,et al.  A comparative study for biomedical named entity recognition , 2015, International Journal of Machine Learning and Cybernetics.

[19]  Michele Leone,et al.  Clustering by Soft-constraint Affinity Propagation: Applications to Gene-expression Data , 2022 .

[20]  Zhen Wang,et al.  Knowledge Graph and Text Jointly Embedding , 2014, EMNLP.

[21]  Y. Jang,et al.  Standards of Medical Care in Diabetes-2010 by the American Diabetes Association: Prevention and Management of Cardiovascular Disease , 2010 .

[22]  Yan Chen,et al.  Relation discovery and hotspots analysis on diabetes mellitus and obesity with representation model , 2017, 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[23]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[24]  Parvez Hossain,et al.  Obesity and diabetes in the developing world--a growing challenge. , 2007, The New England journal of medicine.

[25]  Derek LeRoith,et al.  Obesity and Diabetes: The Increased Risk of Cancer and Cancer-Related Mortality. , 2015, Physiological reviews.

[26]  E. George,et al.  Journal of the American Statistical Association is currently published by American Statistical Association. , 2007 .

[27]  Jure Leskovec,et al.  A computational approach to politeness with application to social factors , 2013, ACL.

[28]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[29]  Dan Roth,et al.  Unsupervised Sparse Vector Densification for Short Text Similarity , 2015, NAACL.

[30]  Leonhard Hennig,et al.  Topic-based Multi-Document Summarization with Probabilistic Latent Semantic Analysis , 2009, RANLP.

[31]  Ann D Colosia,et al.  Prevalence of hypertension and obesity in patients with type 2 diabetes mellitus in observational studies: a systematic literature review , 2013, Diabetes, metabolic syndrome and obesity : targets and therapy.

[32]  W. Bruce Croft,et al.  LDA-based document models for ad-hoc retrieval , 2006, SIGIR.

[33]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[34]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[35]  Hua Xu,et al.  Chinese comments sentiment classification based on word2vec and SVMperf , 2015, Expert Syst. Appl..

[36]  B. Swinburn,et al.  The global obesity pandemic: shaped by global drivers and local environments , 2011, The Lancet.

[37]  Graham A Colditz,et al.  The Burden of Obesity on Diabetes in the United States: Medical Expenditure Panel Survey, 2008 to 2012. , 2017, Value in health : the journal of the International Society for Pharmacoeconomics and Outcomes Research.

[38]  David M. Blei,et al.  Probabilistic topic models , 2012, Commun. ACM.

[39]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[40]  Quentin Pleple,et al.  Interactive Topic Modeling , 2013 .

[41]  Shasha Wang,et al.  Deep feature weighting for naive Bayes and its application to text classification , 2016, Eng. Appl. Artif. Intell..

[42]  Lorenzo Bruzzone,et al.  A Fuzzy-Statistics-Based Affinity Propagation Technique for Clustering in Multispectral Images , 2010, IEEE Transactions on Geoscience and Remote Sensing.

[43]  Balaraman Ravindran,et al.  Latent dirichlet allocation based multi-document summarization , 2008, AND '08.

[44]  Xianghua Fu,et al.  Multi-aspect Blog Sentiment Analysis Based on LDA Topic Model and Hownet Lexicon , 2011, WISM.

[45]  Yun Zhu,et al.  Support vector machines and Word2vec for text classification with semantic features , 2015, 2015 IEEE 14th International Conference on Cognitive Informatics & Cognitive Computing (ICCI*CC).

[46]  Walter T Ambrosius,et al.  The effects of medical management on the progression of diabetic retinopathy in persons with type 2 diabetes: the Action to Control Cardiovascular Risk in Diabetes (ACCORD) Eye Study. , 2014, Ophthalmology.

[47]  Lior Wolf,et al.  Joint word2vec Networks for Bilingual Semantic Representations , 2014, Int. J. Comput. Linguistics Appl..

[48]  Po Hu,et al.  Learning Continuous Word Embedding with Metadata for Question Retrieval in Community Question Answering , 2015, ACL.