Identifying and Analyzing Popular Phrases Multi-Dimensionally in Social Media Data

With the success of social media, social network analysis has become a very hot research topic and attracted much attention in the last decade. Most studies focus on analyzing the whole network from the perspective of topology or contents. However, there is still no systematic model proposed for multi-dimensional analysis on big social media data. Furthermore, little work has been done on identifying emerging new popular phrases and analyzing them multi-dimensionally. In this paper, the authors first propose an interactive systematic framework. In order to detect the emerging new popular phrases effectively and efficiently, they present an N-Pat Tree model and give some filtering mechanisms. They also propose an algorithm to find and analyze new popular phrases multi-dimensionally. The experiments on one-year Tencent-Microblogs data have demonstrated the effectiveness of their work and shown many meaningful results.

[1]  Balachander Krishnamurthy,et al.  A few chirps about twitter , 2008, WOSN '08.

[2]  Gaston H. Gonnet,et al.  New Indices for Text: Pat Trees and Pat Arrays , 1992, Information Retrieval: Data Structures & Algorithms.

[3]  Timothy W. Finin,et al.  Why we twitter: understanding microblogging usage and communities , 2007, WebKDD/SNA-KDD '07.

[4]  Lee-Feng Chien,et al.  PAT-tree-based keyword extraction for Chinese information retrieval , 1997, SIGIR '97.

[5]  Wolfgang Kellerer,et al.  Outtweeting the Twitterers - Predicting Information Cascades in Microblogs , 2010, WOSN.

[6]  Donald R. Morrison,et al.  PATRICIA—Practical Algorithm To Retrieve Information Coded in Alphanumeric , 1968, J. ACM.

[7]  Amit Saxena,et al.  Dimensionality Reduction with Unsupervised Feature Selection and Applying Non-Euclidean Norms for Classification Accuracy , 2010, Int. J. Data Warehous. Min..

[8]  Krishna P. Gummadi,et al.  Measuring User Influence in Twitter: The Million Follower Fallacy , 2010, ICWSM.

[9]  Bernard J. Jansen,et al.  Twitter power: Tweets as electronic word of mouth , 2009 .

[10]  Haibin Sun Efficient algorithms for spatial configuration information retrieval , 2009, Knowl. Based Syst..

[11]  Sebastián Lozano,et al.  Parallel Fuzzy c-Means Clustering for Large Data Sets , 2002, Euro-Par.

[12]  B. Pittel Asymptotical Growth of a Class of Random Trees , 1985 .

[13]  Bin Wu,et al.  Community detection in large-scale social networks , 2007, WebKDD/SNA-KDD '07.

[14]  David Taniar,et al.  Exception rules in association rule mining , 2008, Appl. Math. Comput..

[15]  Mary Beth Rosson,et al.  How and why people Twitter: the role that micro-blogging plays in informal communication at work , 2009, GROUP.

[16]  Wojciech Szpankowski,et al.  Self-Alignments in Words and Their Applications , 1992, J. Algorithms.

[17]  David Taniar,et al.  Adaptive estimated maximum-entropy distribution model , 2007, Inf. Sci..

[18]  Hideki Mima,et al.  Automatic recognition of multi-word terms:. the C-value/NC-value method , 2000, International Journal on Digital Libraries.

[19]  Qiang Wang,et al.  Topic oriented community detection through social objects and link analysis in social networks , 2012, Knowl. Based Syst..

[20]  Arun Sundararajan,et al.  Distinguishing influence-based contagion from homophily-driven diffusion in dynamic networks , 2009, Proceedings of the National Academy of Sciences.

[21]  David Taniar,et al.  Exception Rules Mining Based on Negative Association Rules , 2004, ICCSA.

[22]  Dominique Genoud,et al.  I-BAT: A Data-Intensive Solution Based on the Internet of Things to Predict Energy Behaviors in Microgrids , 2016, Int. J. Data Warehous. Min..