Early detection method for emerging topics based on dynamic bayesian networks in micro-blogging networks

We propose a new method for early detection of emerging topics in micro-blogging.We find two characteristics of emerging topic which influence topic diffusion.We build a new DBN-based model to represent the temporal evolution of keyword.Performance of our method leads one to two hours earlier than others. Micro-blogging networks have become the most influential online social networks in recent years, more and more people are used to obtain and diffuse information in them. Detecting topics from a great number of tweets in micro-blogging is important for information propagation and business marketing, especially detecting emerging topics in the early period could strongly support these real-time intelligent systems, such as real-time recommendation, ad-targeting, marketing strategy. However, most of previous researches are useful to detect emerging topic on a large scale, but they are not so effective for the early detection due to less informative properties in a relatively small size. To solve this problem, we propose a new early detection method for emerging topics based on Dynamic Bayesian Networks in micro-blogging networks. We first analyze the topic diffusion process and find two main characteristics of emerging topic which are attractiveness and key-node. Then based on this finding, we select features from the topology properties of topic diffusion, and build a DBN-based model by the conditional dependencies between features to identify the emerging keywords. An emerging keyword not only occurs in a given time period with frequency properties, but also diffuses with specific topology properties. Finally, we cluster the emerging keywords into emerging topics by the co-occurrence relations between keywords. Based on the real data of Sina micro-blogging, the experimental results demonstrate that our method is effective and capable of detecting the emerging topics one to two hours earlier than the other methods.

[1]  Jure Leskovec,et al.  Modeling Information Diffusion in Implicit Networks , 2010, 2010 IEEE International Conference on Data Mining.

[2]  Hao Wang,et al.  Idea discovery: A scenario-based systematic approach for decision making in market innovation , 2013, Expert Syst. Appl..

[3]  ChenKuan-Yu,et al.  Hot Topic Extraction Based on Timeline Analysis and Multidimensional Sentence Modeling , 2007 .

[4]  Michael C. Horsch,et al.  Dynamic Bayesian networks , 1990 .

[5]  Ana-Maria Popescu,et al.  Detecting controversial events from twitter , 2010, CIKM.

[6]  Bing Liu,et al.  Mining topics in documents: standing on the shoulders of big data , 2014, KDD.

[7]  Peng Chang,et al.  Online hot topic detection from web news archive in short terms , 2014, 2014 11th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD).

[8]  Michael S. Bernstein,et al.  Short and tweet: experiments on recommending content from information streams , 2010, CHI.

[9]  Christopher Cieri Multiple Annotations of Reusable Data Resources: Corpora for Topic Detection and Tracking , 2000 .

[10]  Yukio Ohsawa,et al.  KeyGraph: automatic indexing by co-occurrence graph based on building construction metaphor , 1998, Proceedings IEEE International Forum on Research and Technology Advances in Digital Libraries -ADL'98-.

[11]  Mitsuru Ishizuka,et al.  Topic extraction from news archive using TF*PDF algorithm , 2002, Proceedings of the Third International Conference on Web Information Systems Engineering, 2002. WISE 2002..

[12]  Daniel B. Neill,et al.  Non-parametric scan statistics for event detection and forecasting in heterogeneous social media graphs , 2014, KDD.

[13]  Nan Liu,et al.  Microblog bursty feature detection based on dynamics model , 2012, 2012 International Conference on Systems and Informatics (ICSAI2012).

[14]  Yoav Shoham,et al.  Fab: content-based, collaborative recommendation , 1997, CACM.

[15]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[16]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[17]  Ed H. Chi,et al.  Want to be Retweeted? Large Scale Analytics on Factors Impacting Retweet in Twitter Network , 2010, 2010 IEEE Second International Conference on Social Computing.

[18]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[19]  Mike Thelwall,et al.  Sentiment in Twitter events , 2011, J. Assoc. Inf. Sci. Technol..

[20]  Ken-ichi Kawarabayashi,et al.  Real-Time Top-R Topic Detection on Twitter with Topic Hijack Filtering , 2015, KDD.

[21]  Mounia Lalmas,et al.  A survey on the use of relevance feedback for information access systems , 2003, The Knowledge Engineering Review.

[22]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[23]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[24]  Enrique Herrera-Viedma,et al.  A new model to quantify the impact of a topic in a location over time with Social Media , 2015, Expert Syst. Appl..

[25]  Younghoon Kim,et al.  TWILITE: A recommendation system for Twitter using a probabilistic model based on latent Dirichlet allocation , 2014, Inf. Syst..

[26]  Richard Sproat,et al.  Mining named entities with temporally correlated bursts from multilingual web news streams , 2011, WSDM '11.

[27]  Dragomir R. Radev,et al.  Content Based Recommendation and Summarization in the Blogosphere , 2009, ICWSM.

[28]  Bin Li,et al.  Explore the evolution of development topics via on-line LDA , 2015, 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[29]  G. David Forney,et al.  The Viterbi Algorithm: A Personal History , 2005, ArXiv.

[30]  Jaegul Choo,et al.  Simultaneous Discovery of Common and Discriminative Topics via Joint Nonnegative Matrix Factorization , 2015, KDD.

[31]  Kuan-Yu Chen,et al.  Hot Topic Extraction Based on Timeline Analysis and Multidimensional Sentence Modeling , 2007, IEEE Transactions on Knowledge and Data Engineering.

[32]  Jan Snajder,et al.  Event graphs for information retrieval and multi-document summarization , 2014, Expert Syst. Appl..

[33]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[34]  Antonio Moreno,et al.  Unsupervised topic discovery in micro-blogging networks , 2015, Expert Syst. Appl..

[35]  Isabell M. Welpe,et al.  Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment , 2010, ICWSM.

[36]  Xiaolong Wang,et al.  On-line Hot Topic Recommendation Using Tolerance Rough Set Based Topic Clustering , 2010, J. Comput..

[37]  James Allan,et al.  Introduction to topic detection and tracking , 2002 .

[38]  Min Zhang,et al.  Automatic online news topic ranking using media focus and user attention based on aging theory , 2008, CIKM '08.

[39]  Yiming Yang,et al.  Topic Detection and Tracking Pilot Study Final Report , 1998 .

[40]  Eugene Agichtein,et al.  TM-LDA: efficient online modeling of latent topic transitions in social media , 2012, KDD.

[41]  Lan Du,et al.  Differential Topic Models , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Chien Chin Chen,et al.  Life Cycle Modeling of News Events Using Aging Theory , 2003, ECML.

[43]  S. Eddy Hidden Markov models. , 1996, Current opinion in structural biology.

[44]  Tie-Yan Liu,et al.  LightLDA: Big Topic Models on Modest Computer Clusters , 2014, WWW.

[45]  Zhoujun Li,et al.  Emerging topic detection for organizations from microblogs , 2013, SIGIR.

[46]  Fan Zhang,et al.  Emerging Rumor Identification for Social Media with Hot Topic Detection , 2015, 2015 12th Web Information System and Application Conference (WISA).

[47]  Finn V. Jensen,et al.  Bayesian Networks and Decision Graphs , 2001, Statistics for Engineering and Information Science.

[48]  Qinghua Zheng,et al.  Group dynamics in discussing incidental topics over online social networks , 2010, IEEE Network.

[49]  Mohamed A. Sharaf,et al.  Emerging event detection in social networks with location sensitivity , 2014, World Wide Web.

[50]  Chen Zhang,et al.  A hybrid term-term relations analysis approach for topic detection , 2016, Knowl. Based Syst..

[51]  Mario Cataldi,et al.  Emerging topic detection on Twitter based on temporal and social terms evaluation , 2010, MDMKDD '10.

[52]  M. Shamim Hossain,et al.  Cross-Platform Emerging Topic Detection and Elaboration from Multimedia Streams , 2015, ACM Trans. Multim. Comput. Commun. Appl..