Quantifying sentiment and influence in blogspaces

The weblog, or blog, has become a popular form of social media, through which authors can write posts, which can in turn generate feedback in the form of user comments. When considered in totality, a collection of blogs can thus be viewed as a sort of informal collection of mass sentiment and opinion. An obvious topic of interest might be to mine this collection to obtain some gauge of public sentiment over the wide variety of topics contained therein. However, the sheer size of the so-called blogosphere, combined with the fact that the subjects of posts can vary over a practically limitless number of topics poses some serious challenges when any meaningful analysis is attempted. Namely, the fact that largely anyone with access to the Internet can author their own blog, raises the serious issue of credibility---should some blogs be considered to be more influential than others, and consequently, when gauging sentiment with respect to a topic, should some blogs be weighted more heavily than others? In addition, as new posts and comments can be made on almost a constant basis, any blog analysis algorithm must be able to handle such updates efficiently. In this paper, we give a formalization of the blog model. We give formal methods of quantifying sentiment and influence with respect to a hierarchy of topics, with the specific aim of facilitating the computation of a per-topic, influence-weighted sentiment measure. Finally, as efficiency is a specific endgoal, we give upper bounds on the time required to update these values with new posts, showing that our analysis and algorithms are scalable.

[1]  Jin-Cheon Na,et al.  Sentiment analysis of movie reviews on discussion boards using a linguistic approach , 2009, CIKM 2009.

[2]  Sung-Hyon Myaeng,et al.  Domain-specific sentiment analysis using contextual feature generation , 2009, TSA@CIKM.

[3]  Prem Melville,et al.  Sentiment analysis of blogs by combining lexical knowledge with text classification , 2009, KDD.

[4]  Brian D. Davison,et al.  Separate and inequal: preserving heterogeneity in topical authority flows , 2008, SIGIR '08.

[5]  Yun Chi,et al.  Splog detection using self-similarity analysis on blog temporal dynamics , 2007, AIRWeb '07.

[6]  Taher H. Haveliwala Topic-sensitive PageRank , 2002, IEEE Trans. Knowl. Data Eng..

[7]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[8]  Yiming Yang,et al.  Multi-strategy learning for topic detection and tracking: a joint report of CMU approaches to multilingual TDT , 2002 .

[9]  Chao Liu,et al.  A probabilistic approach to spatiotemporal theme pattern mining on weblogs , 2006, WWW '06.

[10]  Helena Ahonen-Myka,et al.  Simple Semantics in Topic Detection and Tracking , 2004, Information Retrieval.

[11]  Craig MacDonald,et al.  An effective statistical approach to blog post opinion retrieval , 2008, CIKM '08.

[12]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[13]  Ching-Yung Lin,et al.  Modeling and predicting personal information dissemination behavior , 2005, KDD '05.

[14]  Yun Chi,et al.  Identifying opinion leaders in the blogosphere , 2007, CIKM '07.

[15]  Sung-Hyon Myaeng,et al.  Usefulness of temporal information automatically extracted from news articles for topic tracking , 2004, TALIP.

[16]  Edward Y. Chang,et al.  Mining blog stories using community-based and temporal clustering , 2006, CIKM '06.

[17]  Yulan He,et al.  Joint sentiment/topic model for sentiment analysis , 2009, CIKM.

[18]  Craig MacDonald,et al.  Key blog distillation: ranking aggregates , 2008, CIKM '08.

[19]  Yun Chi,et al.  Structural and temporal analysis of the blogosphere through community factorization , 2007, KDD '07.

[20]  Andrew McCallum,et al.  Topics over time: a non-Markov continuous-time model of topical trends , 2006, KDD '06.

[21]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[22]  Ting Wang,et al.  Efficient social network approximate analysis on blogosphere based on network structure characteristics , 2009, SNA-KDD '09.

[23]  Steven Moran,et al.  The e-Linguistics toolkit , 2008 .

[24]  Yun Chi,et al.  On evolutionary spectral clustering , 2009, TKDD.

[25]  Yun Chi,et al.  Eigen-trend: trend analysis in the blogosphere based on singular value decompositions , 2006, CIKM '06.

[26]  Belle L. Tseng Blog analysis and mining technologies to summarize the wisdom of crowds , 2007, MDM '07.

[27]  Ben Shneiderman,et al.  Analyzing Social Media Networks with NodeXL: Insights from a Connected World , 2010 .

[28]  Timothy W. Finin,et al.  Why we twitter: understanding microblogging usage and communities , 2007, WebKDD/SNA-KDD '07.

[29]  Jonathan G. Fiscus,et al.  Topic detection and tracking evaluation overview , 2002 .

[30]  Yun Chi,et al.  Analyzing communities and their evolutions in dynamic social networks , 2009, TKDD.

[31]  Ravi Kumar,et al.  Structure and evolution of online social networks , 2006, KDD '06.