Extracting news blog hot topics based on the W2T Methodology

Although topic detection and tracking techniques have made great progress, most of the researchers seldom pay more attention to the following two aspects. First, the construction of a topic model does not take the characteristics of different topics into consideration. Second, the factors that determine the formation and development of hot topics are not further analyzed. In order to correctly extract news blog hot topics, the paper views the above problems in a new perspective based on the W2T (Wisdom Web of Things) methodology, in which the characteristics of blog users, context of topic propagation and information granularity are investigated in a unified way. The motivations and features of blog users are first analyzed to understand the characteristics of news blog topics. Then the context of topic propagation is decomposed into the blog community, topic network and opinion network, respectively. Some important factors such as the user behavior pattern, opinion leader and network opinion are identified to track the development trends of news blog topics. Moreover, a blog hot topic detection algorithm is proposed, in which news blog hot topics are identified by measuring the duration, topic novelty, attention degree of users and topic growth. Experimental results show that the proposed method is feasible and effective. These results are also useful for further studying the formation mechanism of opinion leaders in blogspace.

[1]  Yiyu Yao,et al.  Information granulation for web-based information support systems , 2003, SPIE Defense + Commercial Sensing.

[2]  Christopher M. Hoadley,et al.  Anonymity options and professional participation in an online community of practice , 2005, CSCL.

[3]  John R. Anderson,et al.  Reflections of the Environment in Memory Form of the Memory Functions , 2022 .

[4]  Yu Zhang,et al.  New Event Detection Based on Division Comparison of Subtopic: New Event Detection Based on Division Comparison of Subtopic , 2009 .

[5]  Ioanna D. Constantiou,et al.  How do framing strategies influence the user's choice of content on the Web? , 2012, Concurr. Comput. Pract. Exp..

[6]  Frank Wm. Tompa,et al.  Seeking Stable Clusters in the Blogosphere , 2007, VLDB.

[7]  Y. Yao,et al.  Information Granulation for Web based Information Retrieval Support Systems , 2003 .

[8]  Akhil Kumar,et al.  Who blogs what: understanding the publishing behavior of bloggers , 2013, World Wide Web.

[9]  Junjie Yao,et al.  Bursty event detection from collaborative tags , 2011, World Wide Web.

[10]  Sunju Park,et al.  Determining Content Power Users in a Blog Network: An Approach and Its Applications , 2009, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[11]  Philip S. Yu,et al.  Identifying the influential bloggers in a community , 2008, WSDM '08.

[12]  Cai Qing,et al.  An Automatic Keyword Extraction of Chinese Document Algorithm Based on Complex Network Features , 2007 .

[13]  Xiaohui Yu,et al.  Riding the tide of sentiment change: sentiment analysis with evolving online reviews , 2013, World Wide Web.

[14]  James Allan,et al.  On-Line New Event Detection and Tracking , 1998, SIGIR.

[15]  Dai Guo-zhong Topic Analysis of Chinese Text Based on Small World Model , 2007 .

[16]  Chien Chin Chen,et al.  An Aging Theory for Event Life-Cycle Modeling , 2007, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[17]  Ravi Kumar,et al.  On the Bursty Evolution of Blogspace , 2003, WWW '03.

[18]  Katarzyna Musial,et al.  Creation and growth of online social network , 2013, World Wide Web.

[19]  Moe Key,et al.  Internet Popular Topics Extraction of Traffic Content Words Correlation , 2007 .

[20]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[21]  Changjun Hu,et al.  Blog Hotness Evaluation Model Based on Text Opinion Analysis , 2009, 2009 Eighth IEEE International Conference on Dependable, Autonomic and Secure Computing.

[22]  Yunzhong Cao,et al.  Topic Propagation Model Based on Diffusion Threshold in Blog Networks , 2011, 2011 International Conference on Business Computing and Global Informatization.

[23]  Cai Qingsheng,et al.  Automatic keywords extraction of Chinese document using small world structure , 2003, International Conference on Natural Language Processing and Knowledge Engineering, 2003. Proceedings. 2003.

[24]  Thorsten Brants,et al.  Topic-based document segmentation with probabilistic latent semantic analysis , 2002, CIKM '02.

[25]  Panayiotis Bozanis,et al.  Identifying the Productive and Influential Bloggers in a Community , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[26]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[27]  Fan Ji New Event Detection Based on Division Comparison of Subtopic , 2008 .

[28]  Chih-Ping Wei,et al.  Discovering Event Evolution Graphs From News Corpora , 2009, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[29]  Freimut Bodendorf,et al.  Detecting opinion leaders and trends in online social networks , 2009, CIKM-SWSM.

[30]  Narsingh Deo,et al.  Discovering communities in complex networks , 2006, ACM-SE 44.

[31]  Yiyu Yao,et al.  Multiple Representations of Web Content for Effective Knowledge Utilization , 2012, Brain Informatics.

[32]  Lynda L. McGhie,et al.  World Wide Web , 2011, Encyclopedia of Information Assurance.

[33]  Kuan-Yu Chen,et al.  Hot Topic Extraction Based on Timeline Analysis and Multidimensional Sentence Modeling , 2007, IEEE Transactions on Knowledge and Data Engineering.

[34]  Liang Wang,et al.  Mining the hottest topics on Chinese webpage based on the improved k-means partitioning , 2009, 2009 International Conference on Machine Learning and Cybernetics.

[35]  Yun Chi,et al.  Identifying opinion leaders in the blogosphere , 2007, CIKM '07.

[36]  Yung-Ming Li,et al.  Discovering influencers for marketing in the blogosphere , 2011, Inf. Sci..

[37]  Xiaolong Wang,et al.  Online topic detection and tracking of financial news based on hierarchical clustering , 2010, 2010 International Conference on Machine Learning and Cybernetics.

[38]  Hsin-Hsi Chen,et al.  Opinion Extraction, Summarization and Tracking in News and Blog Corpora , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.

[39]  Lei Li,et al.  Why do people blog? Exploration of motivations for blogging , 2010, 2010 IEEE 2nd Symposium on Web Society.

[40]  Katarzyna Musial,et al.  Social networks on the Internet , 2012, World Wide Web.

[41]  Yau-Hwang Kuo,et al.  Cross-Lingual Document Representation and Semantic Similarity Measure: A Fuzzy Set and Rough Set Based Approach , 2010, IEEE Transactions on Fuzzy Systems.

[42]  Min Zhang,et al.  Automatic online news issue construction in web environment , 2008, WWW.

[43]  Jianhua Ma,et al.  Research challenges and perspectives on Wisdom Web of Things (W2T) , 2010, The Journal of Supercomputing.

[44]  Sun Wen-jun,et al.  A social network analysis on Blogospheres , 2008, 2008 International Conference on Management Science and Engineering 15th Annual Conference Proceedings.

[45]  Fei Ding,et al.  The Research on Stability of Diffusion and Competition Between Online Topics , 2010 .

[46]  Jenq-Haur Wang Web-Based Verification on the Representativeness of Terms Extracted from Single Short Documents , 2011, 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology.

[47]  Ch Chen,et al.  Pattern recognition and artificial intelligence , 1976 .

[48]  Jeffrey M. Bradshaw,et al.  Brain Informatics , 2011 .

[49]  Freimut Bodendorf,et al.  Detecting Opinion Leaders and Trends in Online Communities , 2010, 2010 Fourth International Conference on Digital Society.

[50]  Han Ren,et al.  Semi-automatic Hot Event Detection , 2006, ADMA.