Named Entity Trends Originating from Social Media

There have been many studies on finding what people are interested in at any time through analysing trends in language use in documents as they are published on the web. Few, however have sought to consider material containing subject matter that originates in social media. The work reported here attempts to distinguish such material by filtering out features that trend primarily in news media. Trends in daily occurrences of nouns and named entities are examined using the ICWSM 2009 corpus of blogs and news articles. A significant number of trends are found to originate in social media and that named entities are more prevalent in them than nouns. Taking features that trend in later news stories as a indication of a topic of wider interest, named entities are shown to be more likely indicators although the strongest trends are seen in nouns.

[1]  James Allan,et al.  Finding and linking incidents in news , 2007, CIKM '07.

[2]  Lada A. Adamic,et al.  Memes Online: Extracted, Subtracted, Injected, and Recollected , 2011, ICWSM.

[3]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[4]  Miles Osborne,et al.  Streaming First Story Detection with application to Twitter , 2010, NAACL.

[5]  Jon M. Kleinberg,et al.  Bursty and Hierarchical Structure in Streams , 2002, Data Mining and Knowledge Discovery.

[6]  Paul H. Garthwaite,et al.  A Bayesian Mixture Model for Term Re-occurrence and Burstiness , 2005, CoNLL.

[7]  Paul Thompson,et al.  Name Searching and Information Retrieval , 1997, EMNLP.

[8]  Ramesh Nallapati,et al.  Event threading within news topics , 2004, CIKM '04.

[9]  Mario Cataldi,et al.  Emerging topic detection on Twitter based on temporal and social terms evaluation , 2010, MDMKDD '10.

[10]  James Allan,et al.  First story detection in TDT is hard , 2000, CIKM '00.

[11]  Kenneth Ward Church Empirical Estimates of Adaptation: The chance of Two Noriegas is closer to p/2 than p2 , 2000, COLING.

[12]  Jure Leskovec,et al.  Meme-tracking and the dynamics of the news cycle , 2009, KDD.

[13]  Kristina Lerman,et al.  Information Contagion: An Empirical Study of the Spread of News on Digg and Twitter Social Networks , 2010, ICWSM.

[14]  Robert J. Gaizauskas,et al.  Using Coreference Chains for Text Summarization , 1999, COREF@ACL.

[15]  Bernardo A. Huberman,et al.  Trends in Social Media: Persistence and Decay , 2011, ICWSM.

[16]  Nick Koudas,et al.  TwitterMonitor: trend detection over the twitter stream , 2010, SIGMOD Conference.

[17]  Jonathan Foster,et al.  Integrating NLP Tools to Support Information Access to News Archives , 2005 .

[18]  Hamed Haddadi,et al.  Flash floods and ripples: The spread of media content through the blogosphere , 2009, ICWSM 2009.

[19]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[20]  Steven Skiena,et al.  Newspapers vs. Blogs: Who Gets the Scoop? , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.

[21]  Akshay Java,et al.  The ICWSM 2009 Spinn3r Dataset , 2009 .

[22]  Yiming Yang,et al.  Topic-conditioned novelty detection , 2002, KDD.

[23]  Gerhard Weikum,et al.  EnBlogue: emergent topic detection in web 2.0 streams , 2011, SIGMOD '11.

[24]  Ravi Kumar,et al.  On the Bursty Evolution of Blogspace , 2003, WWW '03.

[25]  Baroni Marco,et al.  Words and Echoes: Assessing and Mitigating the Non-Randomness Problem in Word Frequency Distribution Modeling , 2007, ACL 2007.

[26]  Stefan Evert,et al.  Words and Echoes: Assessing and Mitigating the Non-Randomness Problem in Word Frequency Distribution Modeling , 2007, ACL.

[27]  Matthew Hurst,et al.  BlogPulse: Automated Trend Discovery for Weblogs , 2003 .

[28]  Susan T. Dumais,et al.  Newsjunkie: providing personalized newsfeeds via analysis of information novelty , 2004, WWW '04.

[29]  Padmini Srinivasan,et al.  Topic models and a revisit of text-related applications , 2008, PIKM '08.