Explicitly and implicitly exploiting the hierarchical structure for mining website interests on news events

Abstract After a news event, many different websites publish coverage of that event, each expressing their own unique commentary, perspectives, and viewpoints. Websites form around a specific set of interests to cater to different audiences, and discovering these interests can help audiences C especially people and organizations that are interested in news C select the most appropriate websites to use as their sources of information. This paper presents three methods for formally defining and mining a websites interests, each of which is explicitly or implicitly based on a hierarchial structure: website-webpage-keyword. The first, and most straightforward, method explicitly uses keyword-layer network communities and the mapping relations between websites and keywords. The second method expands upon the first method with an iterative algorithm that combines both the mapping relations and the network relations from the website-webpage-keyword structure to further refine the keyword-layer network communities. In the third method, a website topic model implicitly captures the mapping relations among the websites, webpages, and keywords. The performance of three proposed methods in website interest mining is compared using a bespoke evaluation metric. The experimental results show that the iterative procedure designed in the second method is able to improve website interest mining performance, and the website topic model in the third method achieves the best performance among the three methods.

[1]  Przemyslaw Kazienko,et al.  AdROSA - Adaptive personalization of web advertising , 2007, Inf. Sci..

[2]  Karl Aberer,et al.  A Framework for Decentralized Ranking in Web Information Retrieval , 2003, APWeb.

[3]  Andrew McCallum,et al.  Topics over time: a non-Markov continuous-time model of topical trends , 2006, KDD '06.

[4]  Zhiyong Wang,et al.  STRank: A SiteRank algorithm using semantic relevance and time frequency , 2009, 2009 IEEE International Conference on Systems, Man and Cybernetics.

[5]  Rob Law,et al.  A New Framework on Website Evaluation , 2010, 2010 International Conference on E-Business and E-Government.

[6]  Rong Yan,et al.  Mining Social Emotions from Affective Text , 2012, IEEE Transactions on Knowledge and Data Engineering.

[7]  David M. Blei,et al.  Supervised Topic Models , 2007, NIPS.

[8]  Guangquan Zhang,et al.  Uncertainty Analysis for the Keyword System of Web Events , 2016, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[9]  Ramesh Nallapati,et al.  Joint latent topic models for text and citations , 2008, KDD.

[10]  Réka Albert,et al.  Near linear time algorithm to detect community structures in large-scale networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[11]  Chen Lin,et al.  Personalized news recommendation via implicit social experts , 2014, Inf. Sci..

[12]  Jure Leskovec,et al.  Empirical comparison of algorithms for network community detection , 2010, WWW '10.

[13]  Yanghui Rao,et al.  Sentiment topic models for social emotion mining , 2014, Inf. Sci..

[14]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Indexing , 1999, SIGIR Forum.

[15]  David M. Blei,et al.  Hierarchical relational models for document networks , 2009, 0909.4331.

[16]  Ricardo A. Baeza-Yates,et al.  A content and structure website mining model , 2006, WWW '06.

[17]  Gabriella Vigliocco,et al.  The Hidden Markov Topic Model: A Probabilistic Model of Semantic Representation , 2010, Top. Cogn. Sci..

[18]  Michelle X. Zhou,et al.  Who is Doing What and When: Social Map-Based Recommendation for Content-Centric Social Web Sites , 2011, TIST.

[19]  Xue Chen,et al.  Building Association Link Network for Semantic Link on Web Resources , 2011, IEEE Transactions on Automation Science and Engineering.

[20]  M. Mitrovic,et al.  Spectral and dynamical properties in classes of sparse networks with mesoscopic inhomogeneities. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[21]  Wei Wang,et al.  Recommender system application developments: A survey , 2015, Decis. Support Syst..

[22]  L. Xie,et al.  On the effectiveness of subwords for lexical cohesion based story segmentation of Chinese broadcast news , 2011, Inf. Sci..

[23]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[24]  Sachin Garg,et al.  Learning website hierarchies for keyword enrichment in contextual advertising , 2011, WSDM '11.

[25]  Roi Blanco,et al.  Measuring website similarity using an entity-aware click graph , 2012, CIKM '12.

[26]  Thomas L. Griffiths,et al.  Learning author-topic models from text corpora , 2010, TOIS.

[27]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[28]  Christopher C. Yang,et al.  A link classification based approach to website topic hierarchy generation , 2007, WWW '07.

[29]  Xiangfeng Luo,et al.  Topic Model for Graph Mining , 2015, IEEE Transactions on Cybernetics.

[30]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[31]  M. Newman,et al.  Finding community structure in networks using the eigenvectors of matrices. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[32]  Yihong Gong,et al.  Detecting communities and their evolutions in dynamic social networks—a Bayesian approach , 2011, Machine Learning.

[33]  John D. Lafferty,et al.  A correlated topic model of Science , 2007, 0708.3601.

[34]  Thomas L. Griffiths,et al.  Integrating Topics and Syntax , 2004, NIPS.

[35]  Andrea Lancichinetti,et al.  Community detection algorithms: a comparative analysis: invited presentation, extended abstract , 2009, VALUETOOLS.

[36]  Dov Te'eni,et al.  Content versus structure in information environments: a longitudinal analysis of website preferences , 2000, ICIS.

[37]  Christopher C. Yang,et al.  Keyphrase extraction for labeling a website topic hierarchy , 2009, ICEC.

[38]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.