Social Media Text Data Visualization Modeling: A Timely Topic Score Technique

Due to the rapid growth of large size text data from Internet sources like Twitter, social media platforms have become the more popular sources to be utilized to extract information. The extracted text information is then further converted to number through a series of data transformation and then analyzed through text analytics models for decision-making problems. Among the text analytics models, one particular common and popular one is based on Latent Dirichlet Allocation (LDA), which is a topic model method with the topics being clusters of words in the documents associated with fitted multivariate statistical distributions. However, these models are often poor estimators of topic proportions. Hence, this paper proposes a timely topic score technique for social media text data visualization, which is based on a point system from topic models to support text signaling. This importance score system is intended to mitigate the weakness of topic models by employing the topic proportion outputs and assigning importance points to present text topic trends. The technique then generates visualization tools to show topic trends over the studied time period and then further facilitate decision-making problems. Finally, this paper studies two real-life case examples from Twitter text sources and illustrates the efficiency of the methodology.

[1]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[2]  B. Carpenter,et al.  Integrating Out Multinomial Parameters in Latent Dirichlet Allocation and Naive Bayes for Collapsed Gibbs Sampling , 2010 .

[3]  Devavrat Shah,et al.  Community Detection in Networks: The Leader-Follower Algorithm , 2010, ArXiv.

[4]  Emily B. Fox,et al.  A Bayesian Approach for Predicting the Popularity of Tweets , 2013, ArXiv.

[5]  Theodore T. Allen,et al.  Timely Decision Analysis Enabled by Efficient Social Media Modeling , 2017, Decis. Anal..

[6]  Theodore T. Allen,et al.  Exploratory text data analysis for quality hypothesis generation , 2018, Quality Engineering.

[7]  Theodore T. Allen,et al.  A directed topic model applied to call center improvement , 2016 .

[8]  Ralf Herbrich,et al.  Predicting Information Spreading in Twitter , 2010 .

[9]  Theodore T. Allen,et al.  Pareto charting using multifield freestyle text data applied to Toyota Camry user reviews , 2012 .

[10]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Ronen Feldman,et al.  Book Reviews: The Text Mining Handbook: Advanced Approaches to Analyzing Unstructured Data by Ronen Feldman and James Sanger , 2008, CL.

[13]  Scott Counts,et al.  Predicting the Speed, Scale, and Range of Information Diffusion in Twitter , 2010, ICWSM.

[14]  Yee Whye Teh,et al.  A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation , 2006, NIPS.

[15]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.