Big Data Analysis for Personalized Health Activities: Machine Learning Processing for Automatic Keyword Extraction Approach

The obese population is increasing rapidly due to the change of lifestyle and diet habits. Obesity can cause various complications and is becoming a social disease. Nonetheless, many obese patients are unaware of the medical treatments that are right for them. Although a variety of online and offline obesity management services have been introduced, they are still not enough to attract the attention of users and are not much of help to solve the problem. Obesity healthcare and personalized health activities are the important factors. Since obesity is related to lifestyle habits, eating habits, and interests, I concluded that the big data analysis of these factors could deduce the problem. Therefore, I collected big data by applying the machine learning and crawling method to the unstructured citizen health data in Korea and the search data of Naver, which is a Korean portal company, and Google for keyword analysis for personalized health activities. It visualized the big data using text mining and word cloud. This study collected and analyzed the data concerning the interests related to obesity, change of interest on obesity, and treatment articles. The analysis showed a wide range of seasonal factors according to spring, summer, fall, and winter. It also visualized and completed the process of extracting the keywords appropriate for treatment of abdominal obesity and lower body obesity. The keyword big data analysis technique for personalized health activities proposed in this paper is based on individual’s interests, level of interest, and body type. Also, the user interface (UI) that visualizes the big data compatible with Android and Apple iOS. The users can see the data on the app screen. Many graphs and pictures can be seen via menu, and the significant data values are visualized through machine learning. Therefore, I expect that the big data analysis using various keywords specific to a person will result in measures for personalized treatment and health activities.

[1]  Jun-Ho Huh,et al.  A Preliminary Analysis Model of Big Data for Prevention of Bioaccumulation of Heavy Metal-Based Pollutants: Focusing on the Atmospheric Data Analyses , 2016 .

[2]  Carolin Kaiser Opinion Mining im Web 2.0 — Konzept und Fallbeispiel , 2014, HMD Praxis der Wirtschaftsinformatik.

[3]  J. Keziya Rani,et al.  Mining Opinion Features in Customer Reviews. , 2016 .

[4]  Martin Ester,et al.  On the design of LDA models for aspect-based opinion mining , 2012, CIKM.

[5]  Bing Liu,et al.  Identifying comparative sentences in text documents , 2006, SIGIR.

[6]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[7]  Jun-Ho Huh,et al.  A Study on the Design of Humane Animal Care System and Java Implementation , 2018, J. Inf. Process. Syst..

[8]  Zhong Su,et al.  Domain customization for aspect-oriented opinion analysis with multi-level latent sentiment clues , 2011, CIKM '11.

[9]  Johan Bollen,et al.  Twitter mood predicts the stock market , 2010, J. Comput. Sci..

[10]  Yang Luo,et al.  Segmentation of the left ventricle in cardiac MRI using a hierarchical extreme learning machine model , 2017, International Journal of Machine Learning and Cybernetics.

[11]  Michael I. Jordan,et al.  Machine learning: Trends, perspectives, and prospects , 2015, Science.

[12]  Aditya G. Parameswaran,et al.  Blogs as Predictors of Movie Success , 2009, ICWSM.

[13]  Lina Zhou,et al.  Movie Review Mining: a Comparison between Supervised and Unsupervised Classification Approaches , 2005, Proceedings of the 38th Annual Hawaii International Conference on System Sciences.

[14]  Andreas Holzinger,et al.  Machine Learning for Health Informatics , 2016, Lecture Notes in Computer Science.

[15]  Junlan Feng,et al.  Robust Sentiment Detection on Twitter from Biased and Noisy Data , 2010, COLING.

[16]  Vibhu O. Mittal,et al.  Comparative Experiments on Sentiment Classification for Online Product Reviews , 2006, AAAI.

[17]  Jun-Ho Huh,et al.  PLC-based design of monitoring system for ICT-integrated vertical fish farm , 2017, Human-centric Computing and Information Sciences.

[18]  Gilad Mishne,et al.  Predicting Movie Sales from Blogger Sentiment , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.

[19]  Bin Li,et al.  Finding Opinionated Blogs Using Statistical Classifiers and Lexical Features , 2009, ICWSM.

[20]  Jun-Hai Zhai,et al.  The classification of imbalanced large data sets based on MapReduce and ensemble of ELM classifiers , 2015, International Journal of Machine Learning and Cybernetics.

[21]  Ivan Titov,et al.  Modeling online reviews with multi-grain topic models , 2008, WWW.

[22]  Ivan Titov,et al.  A Joint Model of Text and Aspect Ratings for Sentiment Summarization , 2008, ACL.

[23]  Kalina Bontcheva,et al.  Microblog-genre noise and impact on semantic annotation accuracy , 2013, HT.

[24]  Jugal K. Kalita,et al.  Predicting run time of classification algorithms using meta-learning , 2017, Int. J. Mach. Learn. Cybern..

[25]  M. Thelwall,et al.  Data mining emotion in social network communication: Gender differences in MySpace , 2010 .

[26]  Alice H. Oh,et al.  Aspect and sentiment unification model for online review analysis , 2011, WSDM '11.

[27]  Vincent Ng,et al.  Mine the Easy, Classify the Hard: A Semi-Supervised Approach to Automatic Sentiment Classification , 2009, ACL.

[28]  Hongfei Yan,et al.  Jointly Modeling Aspects and Opinions with a MaxEnt-LDA Hybrid , 2010, EMNLP.

[29]  Andreas Holzinger,et al.  Disease-Disease Relationships for Rheumatic Diseases: Web-Based Biomedical Textmining an Knowledge Discovery to Assist Medical Decision Making , 2012, 2012 IEEE 36th Annual Computer Software and Applications Conference.

[30]  Yanchun Zhang,et al.  Privacy-aware access control with trust management in web service , 2011, World Wide Web.

[31]  Michael Hausenblas,et al.  Towards Opinion Mining Through Tracing Discussions on the Web , 2008, SDoW@ISWC.

[32]  Young Sik Kim,et al.  Algorithm for Extrapolating Blogger's Interests through Library Classification Systems , 2008, 2008 IEEE International Conference on Web Services.

[33]  Teruo Higashino,et al.  Edge-centric Computing: Vision and Challenges , 2015, CCRV.

[34]  Bing Liu,et al.  Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data , 2006, Data-Centric Systems and Applications.

[35]  Sinan Aral,et al.  The spread of true and false news online , 2018, Science.

[36]  Tong Zhang,et al.  Fundamental Statistical Techniques , 2010, Handbook of Natural Language Processing.

[37]  Vaibhavi N Patodkar,et al.  Twitter as a Corpus for Sentiment Analysis and Opinion Mining , 2016 .

[38]  Kun Zhang,et al.  Iterative sampling based frequent itemset mining for big data , 2015, Int. J. Mach. Learn. Cybern..

[39]  Wenjia Wang,et al.  Determining appropriate approaches for using data in feature selection , 2017, Int. J. Mach. Learn. Cybern..

[40]  Noémie Elhadad,et al.  An Unsupervised Aspect-Sentiment Model for Online Reviews , 2010, NAACL.

[41]  Yanchun Zhang,et al.  Cloud Service Description Model: An Extension of USDL for Cloud Services , 2018, IEEE Transactions on Services Computing.

[42]  Bing Liu,et al.  Sentiment Analysis and Opinion Mining , 2012, Synthesis Lectures on Human Language Technologies.

[43]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[44]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[45]  T. Bayes An essay towards solving a problem in the doctrine of chances , 2003 .

[46]  Xiaohui Yu,et al.  ARSA: a sentiment-aware model for predicting sales performance using blogs , 2007, SIGIR.

[47]  Bin Li,et al.  Improving Blog Polarity Classification via Topic Analysis and Adaptive Methods , 2010, HLT-NAACL.

[48]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[49]  Jong Hyuk Park,et al.  Block-VN: A Distributed Blockchain Based Vehicular Network Architecture in Smart City , 2017, J. Inf. Process. Syst..

[50]  Claire Cardie,et al.  Hierarchical Sequential Learning for Extracting Opinions and Their Attributes , 2010, ACL.

[51]  Yanchun Zhang,et al.  A flexible payment scheme and its role-based access control , 2005, IEEE Transactions on Knowledge and Data Engineering.

[52]  Mike Thelwall,et al.  Negative emotions boost user activity at BBC forum , 2010, 1011.5459.

[53]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[54]  Jun-Ho Huh,et al.  A Keyword-Based Big Data Analysis for Individualized Health Activity Using Keyword Analysis Technique: A Methodological Approach Using National Health Data , 2017, CSA/CUTE.

[55]  Lidong Bing,et al.  Normalizing web product attributes and discovering domain ontology with minimal effort , 2011, WSDM '11.

[56]  Michal Karpowicz,et al.  Opinion Mining on the Web 2.0 - Characteristics of User Generated Content and Their Impacts , 2013, CHI-KDD.

[57]  Bing Liu,et al.  Opinion observer: analyzing and comparing opinions on the Web , 2005, WWW '05.

[58]  Alan F. Smeaton,et al.  Combining Social Network Analysis and Sentiment Analysis to Explore the Potential for Online Radicalisation , 2009, 2009 International Conference on Advances in Social Network Analysis and Mining.