Mining social media: tracking content and predicting behavior

The advent of social media has established a symbiotic relationship between social media and online news. This relationship can be leveraged for tracking news content, and predicting behavior with tangible real-world applications, e.g., online reputation management, ad pricing, news ranking, and media analysis. In this thesis we focus on tracking news content in social media, and predicting user behavior. In the first part, we develop methods for tracking content which build upon, and extend practices in Information Retrieval. We begin with discovering social media posts that discuss a news article yet they do not provide a hyperlink to it. Our methods model news articles using several channels of information, either endogenous or exogenous to the article. These models are then used to query an index of social media posts. During this process we found that the query models are close in size to the documents to be retrieved, violating a standard assumption of language modeling. We correct for this discrepancy by introducing two hypergeometric language models for modeling both queries, and documents to be retrieved. In the second part, we focus on predicting behavior. First we look at predicting listeners’ preference in spoken user generated content, namely, podcasts. Then, we predict popularity of news articles from several news agents in terms of the volume of comments they receive. We develop models for predicting the popularity of an article for both before and after it is published. Finally, we look at a different aspect of news impact: how reading a news article affects future user browsing behavior. In each setting, we find patterns that characterize the underlying behavior and extract features that we then use to establish models for predicting online behavior.

[1]  Maarten de Rijke,et al.  Extracting the discussion structure in comments on news-articles , 2007, WIDM '07.

[2]  James Allan,et al.  Automatic Hypertext Construction , 1995 .

[3]  M. de Rijke,et al.  Using term clouds to represent segment-level semantic content of podcasts , 2008 .

[4]  Annerieke Heuvelink Cognitive Models for Training Simulations , 2009 .

[5]  Yi Zhang,et al.  Novelty and redundancy detection in adaptive filtering , 2002, SIGIR '02.

[6]  Donald Metzler,et al.  USC/ISI at TREC 2011: Microblog Track , 2011, TREC.

[7]  Bernardo A. Huberman,et al.  The Pulse of News in Social Media: Forecasting Popularity , 2012, ICWSM.

[8]  M.A.J. van Gerven,et al.  Bayesian networks for clinical decision support: A rational approach to dynamic decision-making under uncertainty , 2007 .

[9]  Piek Vossen,et al.  EuroWordNet: A multilingual database with lexical semantic networks , 1998, Springer Netherlands.

[10]  Maarten de Rijke,et al.  News Comments: Exploring, Modeling, and Online Prediction , 2010, ECIR.

[11]  Gediminas Adomavicius,et al.  Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions , 2005, IEEE Transactions on Knowledge and Data Engineering.

[12]  Miles Efron,et al.  Estimation methods for ranking recent information , 2011, SIGIR.

[13]  Krishna P. Gummadi,et al.  Measuring User Influence in Twitter: The Million Follower Fallacy , 2010, ICWSM.

[14]  Kerry Matthews RESEARCH INTO PODCASTING TECHNOLOGY INCLUDING CURRENT AND POSSIBLE FUTURE USES , 2006 .

[15]  Stephen E. Robertson,et al.  A probabilistic model of information retrieval: development and comparative experiments - Part 2 , 2000, Inf. Process. Manag..

[16]  Vera Kartseva,et al.  Designing Controls for Network Organisations: A Value-Based Approach , 2004 .

[17]  Craig MacDonald,et al.  Overview of the TREC 2009 Blog Track , 2009, TREC.

[18]  Michael Gamon,et al.  Predicting Responses to Microblog Posts , 2012, NAACL.

[19]  Djoerd Hiemstra,et al.  Twenty-One at TREC7: Ad-hoc and Cross-Language Track , 1998, TREC.

[20]  Soo Young Rieh,et al.  Developing a unifying framework of credibility assessment: Construct, heuristics, and interaction in context , 2008, Inf. Process. Manag..

[21]  Craig MacDonald,et al.  Using Relevance Feedback in Expert Search , 2007, ECIR.

[22]  Antoni Sobkowicz,et al.  Properties of Social Network in an Internet Political Discussion Forum , 2012, Adv. Complex Syst..

[23]  Adam Vanya,et al.  Supporting Architecture Evolution by Mining Software Repositories , 2012 .

[24]  Stefan Visscher,et al.  Bayesian network models for the management of ventilator-associated pneumonia , 2008 .

[25]  Christian Scheel,et al.  Feed Distillation Using AdaBoost and Topic Maps , 2007, TREC.

[26]  Svetha Venkatesh,et al.  Social reader: towards browsing the social web , 2014, Multimedia Tools and Applications.

[27]  Thijs Westerveld,et al.  Surface Features in Video Retrieval , 2005, Adaptive Multimedia Retrieval.

[28]  M. de Rijke,et al.  Linking Archives Using Document Enrichment and Term Selection , 2011, TPDL.

[29]  Miriam J. Metzger,et al.  Credibility for the 21st Century: Integrating Perspectives on Source, Message, and Media Credibility in the Contemporary Media Environment , 2003 .

[30]  F. V. Gils,et al.  PodVinder : spoken document retrieval for Dutch pod- and vodcasts , 2008 .

[31]  Richard D. Waters,et al.  Messaging, music, and mailbags: How technical design and entertainment boost the performance of environmental organizations’ podcasts , 2012 .

[32]  Kazuhiro Seki,et al.  TREC 2011 Microblog Track Experiments at Kobe University , 2012, TREC.

[33]  Yehuda Koren,et al.  Care to comment?: recommendations for commenting on news stories , 2012, WWW.

[34]  Jenq-Haur Wang,et al.  Finding Event-Relevant Content from the Web Using a Near-Duplicate Detection Approach , 2007, IEEE/WIC/ACM International Conference on Web Intelligence (WI'07).

[35]  W. Bruce Croft,et al.  A Markov random field model for term dependencies , 2005, SIGIR '05.

[36]  Thorsten Quandt,et al.  PARTICIPATORY JOURNALISM PRACTICES IN THE MEDIA AND BEYOND , 2008 .

[37]  Jon Kleinberg,et al.  Differences in the mechanics of information diffusion across topics: idioms, political hashtags, and complex contagion on twitter , 2011, WWW.

[38]  Gerard Salton,et al.  Improving retrieval performance by relevance feedback , 1997, J. Am. Soc. Inf. Sci..

[39]  Craig MacDonald,et al.  Intent-aware search result diversification , 2011, SIGIR.

[40]  H.H.L.M. Donkers,et al.  NOSCE HOSTEM: Searching with Opponent Models , 1997 .

[41]  Katsumi Tanaka,et al.  Complementary information retrieval for cross-media news content , 2004, MMDB '04.

[42]  O. Sharpanskykh,et al.  On Computer-Aided Methods for Modeling and Analysis of Organizations , 2008 .

[43]  Katja Hofmann,et al.  The impact of document structure on keyphrase extraction , 2009, CIKM.

[44]  C. J. van Rijsbergen,et al.  Quantification of topic propagation using percolation theory: a study of the icwsm network , 2009, ICWSM 2009.

[45]  Monika Henzinger,et al.  Detecting the origin of text segments efficiently , 2009, WWW '09.

[46]  Aitao Chen,et al.  Cross-language Retrieval Experiments at CLEF 2002 , 2002, CLEF.

[47]  Djoerd Hiemstra,et al.  Bayesian extension to the language model for ad hoc information retrieval , 2003, SIGIR.

[48]  Karianne Vermaas,et al.  Fast diffusion and broadening use: A research on residential adoption and usage of broadband internet in the Netherlands between 2001 and 2005 , 2007 .

[49]  Frank van Harmelen,et al.  Ontology-based information sharing , 2005 .

[50]  Daniele Quercia,et al.  Auralist: introducing serendipity into music recommendation , 2012, WSDM '12.

[51]  Jiyin He,et al.  Exploring topic structure: coherence, diversity and relatedness , 2012, SIGF.

[52]  Bill N. Schilit,et al.  Generating links by mining quotations , 2008, Hypertext.

[53]  Ryen W. White,et al.  Predicting user interests from contextual information , 2009, SIGIR.

[54]  Aristides Gionis,et al.  The query-flow graph: model and applications , 2008, CIKM '08.

[55]  Deborah S. Chung,et al.  Interactive Features of Online Newspapers: Identifying Patterns and Predicting Use of Engaged Readers , 2008, J. Comput. Mediat. Commun..

[56]  Ee-Peng Lim,et al.  Modeling Diffusion in Social Networks Using Network Properties , 2012, ICWSM.

[57]  Paul Buitelaar,et al.  Semantic annotation for concept-based cross-language medical information retrieval , 2002, Int. J. Medical Informatics.

[58]  Z. Aleksovski,et al.  Using background knowledge in ontology matching , 2008 .

[59]  Yasufumi Takama,et al.  Visualization of News Distribution in Blog Space , 2006, 2006 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology Workshops.

[60]  Qing Gu,et al.  Guiding Service-Oriented Software Engineering: A View-based Approach , 2011 .

[61]  Matthew Rowe,et al.  Behaviour analysis across different types of enterprise online communities , 2012, WebSci '12.

[62]  P. V. Maanen Adaptive Support for Human-Computer Teams : Exploring the Use of Cognitive Models of Trust and Attention , 2010 .

[63]  Thomas Gottron,et al.  LiveTweet: Microblog Retrieval Based on Interestingness and an Adaptation of the Vector Space Model , 2011, TREC.

[64]  Joyca Lacroix,et al.  NIM : a situated computational memory model , 2003 .

[65]  Charles L. A. Clarke,et al.  Classifying and Characterizing Query Intent , 2009, ECIR.

[66]  Alberto Barrón-Cedeño,et al.  Reducing the Plagiarism Detection Search Space on the Basis of the Kullback-Leibler Distance , 2009, CICLing.

[67]  Gianluca Demartini,et al.  Predicting the Future Impact of News Events , 2012, ECIR.

[68]  Wouter Weerkamp,et al.  Twitter hashtags: Joint Translation and Clustering , 2011 .

[69]  Elad Yom-Tov,et al.  On the Relationship between Novelty and Popularity of User-Generated Content , 2010, TIST.

[70]  Benno Stein,et al.  Information Retrieval in the Commentsphere , 2012, TIST.

[71]  C. Gerritsen Caught in the Act: Investigating Crime by Agent-Based Simulation , 2010 .

[72]  Mostafa Keikha,et al.  Blog distillation using random walks , 2009, SIGIR.

[73]  Jingfeng Xia,et al.  Let us take a Yale open course: a Chinese view of open educational resources provided by institutions in the West , 2013, J. Comput. Assist. Learn..

[74]  Bernardo A. Huberman,et al.  Predicting the popularity of online content , 2008, Commun. ACM.

[75]  James Allan,et al.  Text classification and named entities for new event detection , 2004, SIGIR '04.

[76]  Martin Franz,et al.  Unsupervised and supervised clustering for topic tracking , 2001, SIGIR '01.

[77]  Rianne Kaptein,et al.  Effective focused retrieval by exploiting query context and document structure , 2012, SIGF.

[78]  Loes M. M. Braun,et al.  Pro-active medical information retrieval , 2002 .

[79]  Hila Becker,et al.  Identification and Characterization of Events in Social Media , 2011 .

[80]  Maarten de Rijke,et al.  Hypergeometric language models for republished article finding , 2011, SIGIR '11.

[81]  Gregoris Mentzas,et al.  Using Social Media to Predict Future Events with Agent-Based Markets , 2010, IEEE Intelligent Systems.

[82]  Fernando Diaz,et al.  Improving the estimation of relevance models using large external corpora , 2006, SIGIR.

[83]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[84]  Jure Leskovec,et al.  Meme-tracking and the dynamics of the news cycle , 2009, KDD.

[85]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[86]  Wang-Chien Lee,et al.  A straw shows which way the wind blows: ranking potentially popular items from early votes , 2012, WSDM '12.

[87]  Katja Hofmann,et al.  The University of Amsterdam at TREC 2009: Blog, Web, Entity, and Relevance Feedback , 2009 .

[88]  Soo Young Rieh,et al.  Credibility: A multidisciplinary framework , 2007, Annu. Rev. Inf. Sci. Technol..

[89]  A. Bell The language of news media , 1991 .

[90]  Mounia Lalmas,et al.  A survey on the use of relevance feedback for information access systems , 2003, The Knowledge Engineering Review.

[91]  Maliha S. Nash,et al.  Handbook of Parametric and Nonparametric Statistical Procedures , 2001, Technometrics.

[92]  Andrew Trotman,et al.  Overview of the INEX 2010 Link the Wiki Track , 2010, INEX.

[93]  Ricardo Baeza-Yates,et al.  Enhancing Document Snippets Using Temporal Information , 2011, SPIRE.

[94]  Simon Carter,et al.  Exploration and exploitation of multilingual data for statistical machine translation , 2012 .

[95]  Alton Yeow-Kuan Chua,et al.  Social tags as news event detectors , 2011, J. Inf. Sci..

[96]  Craig MacDonald,et al.  Overview of the TREC 2006 Blog Track , 2006, TREC.

[97]  F. Wetenschappen,et al.  Embodied agents from a user's perspective , 2008 .

[98]  David S. Moore,et al.  The Basic Practice of Statistics [With CDROM] , 1999 .

[99]  Gilad Mishne,et al.  Leave a Reply: An Analysis of Weblog Comments , 2006 .

[100]  L. H. Christoph The role of metacognitive skills in learning to solve problems , 2006 .

[101]  Bernardo A. Huberman,et al.  Predicting the Future with Social Media , 2010, Web Intelligence.

[102]  Marti A. Hearst Untangling Text Data Mining , 1999, ACL.

[103]  M. de Rijke,et al.  Credibility-inspired ranking for blog post retrieval , 2012, Information Retrieval.

[104]  Tasos Spiliotopoulos Votes and Comments in Recommender Systems : The Case of Digg Tasos Spiliotopoulos Madeira Interactive Technologies Institute University of Madeira , 2009 .

[105]  Nicholas J. Belkin,et al.  Display time as implicit feedback: understanding task effects , 2004, SIGIR '04.

[106]  R. M. van Lambalgen,et al.  When the Going Gets Tough: Exploring Agent-based Models of Human Performance under Demanding Conditions , 2012 .

[107]  Martha Larson,et al.  Investigating the Global Semantic Impact of Speech Recognition Error on Spoken Content Collections , 2009, ECIR.

[108]  P. I. Hofgesang,et al.  Modelling Web Usage in a Changing Environment , 2009 .

[109]  W. Bruce Croft,et al.  Finding text reuse on the web , 2009, WSDM '09.

[110]  Ellen M. Voorhees,et al.  Using WordNet to disambiguate word senses for text retrieval , 1993, SIGIR.

[111]  M. Korotkiy,et al.  From Ontology-enabled Services to Service-enabled Ontologies : Making Ontologies Work in e-Science with Onto SOA , 2009 .

[112]  Peter Van Rosmalen,et al.  Supporting the tutor in the design and support of adaptive e-learning , 2008 .

[113]  Meredith Ringel Morris,et al.  #TwitterSearch: a comparison of microblog search and web search , 2011, WSDM '11.

[114]  Jan Broersen Modal Action Logics for Reasoning about Reactive Systems , 2003 .

[115]  Gerhard Weikum,et al.  A Language Modeling Approach for Temporal Information Needs , 2010, ECIR.

[116]  Jaap Kamps,et al.  The importance of anchor text for ad hoc search revisited , 2010, SIGIR '10.

[117]  F. Divina Hybrid Genetic Relational Search for Inductive Learning , 2004 .

[118]  Gianni Amati,et al.  Frequentist and Bayesian Approach to Information Retrieval , 2006, ECIR.

[119]  Jacco van Ossenbruggen,et al.  Processing structured hypermedia - a matter of style , 2001, SIKS dissertation series.

[120]  F. J. Wiesman,et al.  Information retrieval by graphically browsing meta-information , 1998 .

[121]  Valentin Jijkoun,et al.  Named entity normalization in user generated content , 2008, AND '08.

[122]  Wouter Weerkamp,et al.  Microblog language identification: overcoming the limitations of short, unedited and idiomatic text , 2012, Language Resources and Evaluation.

[123]  RiehSoo Young,et al.  Developing a unifying framework of credibility assessment , 2008 .

[124]  Munmun De Choudhury,et al.  Can blog communication dynamics be correlated with stock market activity? , 2008, Hypertext.

[125]  Mark van Assem,et al.  Converting and Integrating Vocabularies for the Semantic Web , 2010 .

[126]  Johan Bollen,et al.  Twitter mood predicts the stock market , 2010, J. Comput. Sci..

[127]  Jianfeng Gao,et al.  Linear discriminant model for information retrieval , 2005, SIGIR '05.

[128]  Lloyd A. Smith,et al.  Practical feature subset selection for machine learning , 1998 .

[129]  Geoffrey Zweig,et al.  Syntactic Clustering of the Web , 1997, Comput. Networks.

[130]  Lars Backstrom,et al.  The Anatomy of the Facebook Social Graph , 2011, ArXiv.

[131]  E. A. Fox,et al.  Combining the Evidence of Multiple Query Representations for Information Retrieval , 1995, Inf. Process. Manag..

[132]  M. de Rijke,et al.  Generating Pseudo Test Collections for Learning to Rank Scientific Articles , 2012, CLEF.

[133]  Andrei Broder,et al.  A taxonomy of web search , 2002, SIGF.

[134]  Eugueni Smirnov,et al.  Conjunctive and Disjunctive Version Spaces with Instance-based Boundary Sets , 2001 .

[135]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[136]  George Ghinea,et al.  Measuring quality of perception in distributed multimedia: Verbalizers vs. imagers , 2008, Comput. Hum. Behav..

[137]  Leo Egghe,et al.  A Theoretical Study of Recall and Precision Using a Topological Approach to Information Retrieval , 1998, Inf. Process. Manag..

[138]  Laurie J. Patterson The Technology Underlying Podcasts , 2006, Computer.

[139]  Roi Blanco,et al.  Language intent models for inferring user browsing behavior , 2012, SIGIR '12.

[140]  Clement T. Yu,et al.  A theory of term importance in automatic text analysis , 1974, J. Am. Soc. Inf. Sci..

[141]  Richard M. Schwartz,et al.  A hidden Markov model information retrieval system , 1999, SIGIR '99.

[142]  Edward A. Fox,et al.  Combination of Multiple Searches , 1993, TREC.

[143]  Eugene Agichtein,et al.  Predicting information seeker satisfaction in community question answering , 2008, SIGIR '08.

[144]  JungherrAndreas,et al.  Why the Pirate Party Won the German Election of 2009 or The Trouble With Predictions , 2012 .

[145]  G. A. Mishne,et al.  Expiriments with mood classification in blog posts , 2005, SIGIR 2005.

[146]  Hubert Vogten,et al.  Design and Implementation Strategies for IMS Learning Design , 2008 .

[147]  M. de Rijke,et al.  Linking online news and social media , 2011, WSDM '11.

[148]  Felipe Bravo-Marquez,et al.  Hypergeometric Language Model and Zipf-Like Scoring Function for Web Document Similarity Retrieval , 2010, SPIRE.

[149]  Wilhelmus Lambertus Adrianus Derks Improving Concurrency and Recovery in Database Systems by Exploiting Application Semantics , 2005 .

[150]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[151]  Leo Egghe,et al.  Duality in information retrieval and the hypergeometric distribution , 1997, J. Documentation.

[152]  Avi Arampatzis,et al.  Text Filtering using Linguistically-Motivated Indexing Terms , 1999 .

[153]  JonesK. Sparck,et al.  A probabilistic model of information retrieval , 2000 .

[154]  Felipe Bravo-Marquez,et al.  A Text Similarity Meta-Search Engine Based on Document Fingerprints and Search Results Records , 2011, 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology.

[155]  Xuanjing Huang,et al.  Efficient partial-duplicate detection based on sequence matching , 2010, SIGIR.

[156]  Katrina Fenlon,et al.  Improving retrieval of short texts through document expansion , 2012, SIGIR '12.

[157]  Daisuke Ikeda,et al.  Automatically Linking News Articles to Blog Entries , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.

[158]  Eric Brill,et al.  Improving web search ranking by incorporating user behavior information , 2006, SIGIR.

[159]  Serge Fdida,et al.  Predicting the popularity of online articles based on user comments , 2011, WIMS '11.

[160]  Lipika Dey,et al.  Studying the effects of noisy text on text mining applications , 2009, AND '09.

[161]  Barry Smyth,et al.  Using twitter to recommend real-time topical news , 2009, RecSys '09.

[162]  Wessel Kraaij,et al.  Viewing stemming as recall enhancement , 1996, SIGIR '96.

[163]  Miriam J. Metzger Making sense of credibility on the Web: Models for evaluating online information and recommendations for future research , 2007, J. Assoc. Inf. Sci. Technol..

[164]  Xueqi Cheng,et al.  Intent-aware query similarity , 2011, CIKM '11.

[165]  M. de Rijke,et al.  Adding semantics to microblog posts , 2012, WSDM '12.

[166]  Jaime G. Carbonell,et al.  Retrieval and Feedback Models for Blog Distillation , 2007, TREC.

[167]  Seungyeop Han,et al.  Analysis of topological characteristics of huge online social networking services , 2007, WWW '07.

[168]  Lada A. Adamic,et al.  The Party Is Over Here: Structure and Content in the 2010 Election , 2011, ICWSM.

[169]  Herwin van Welbergen,et al.  Behavior Generation for Interpersonal Coordination with Virtual Humans: on Specifying, Scheduling and Realizing Multimodal Virtual Human Behavior , 2011 .

[170]  A. Bhulai,et al.  Dynamic website optimization through autonomous management of design patterns , 2011 .

[171]  Jaime G. Carbonell,et al.  Document Representation and Query Expansion Models for Blog Recommendation , 2008, ICWSM.

[172]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[173]  Gilad Mishne,et al.  Capturing Global Mood Levels using Blog Posts , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.

[174]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[175]  Fernando Pereira,et al.  Reading the Markets: Forecasting Public Opinion of Political Candidates by News Analysis , 2008, COLING.

[176]  Jungwoo Kim,et al.  The politics of comments: predicting political orientation of news stories with commenters' sentiment patterns , 2011, CSCW.

[177]  Maarten de Rijke,et al.  From Blogs to News: Identifying Hot Topics in the Blogosphere , 2009, TREC.

[178]  Jong Wook Kim,et al.  Organization and Tagging of Blog and News Entries Based on Content Reuse , 2010, J. Signal Process. Syst..

[179]  Wouter Weerkamp,et al.  How people use Twitter in different languages , 2011 .

[180]  Kavé Salamatian,et al.  Understanding the characteristics of online commenting , 2008, CoNEXT '08.

[181]  Jacob Lenting Informed gambling : conception and analysis of a multi-agent mechanism for discrete reallocation , 1999 .

[182]  Munmun De Choudhury,et al.  What makes conversations interesting?: themes, participants and consequences of conversations in online social media , 2009, WWW '09.

[183]  Pavel Serdyukov,et al.  Yandex at TREC 2011 Microblog Track , 2011, TREC.

[184]  Wouter Weerkamp,et al.  Finding people and their utterances in social media , 2010, SIGIR.

[185]  Ramanathan V. Guha,et al.  Information diffusion through blogspace , 2004, WWW '04.

[186]  Yin Yang,et al.  Query by document , 2009, WSDM '09.

[187]  Jonathan L. Herlocker,et al.  Evaluating collaborative filtering recommender systems , 2004, TOIS.

[188]  Nattiya Kanhabua,et al.  Time-aware approaches to information retrieval , 2012, SIGF.

[189]  Isabell M. Welpe,et al.  Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment , 2010, ICWSM.

[190]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[191]  David A. Cohn,et al.  The Missing Link - A Probabilistic Model of Document Content and Hypertext Connectivity , 2000, NIPS.

[192]  Valentin Jijkoun,et al.  The Impact of Named Entity Normalization on Information Retrieval for Question Answering , 2008, ECIR.

[193]  M. de Rijke,et al.  Monolingual Document Retrieval for European Languages , 2004, Information Retrieval.

[194]  Soo Young Rieh Judgment of information quality and cognitive authority in the Web , 2002, J. Assoc. Inf. Sci. Technol..

[195]  R. Rosenfeld,et al.  Two decades of statistical language modeling: where do we go from here? , 2000, Proceedings of the IEEE.

[196]  CHENGXIANG ZHAI,et al.  A study of smoothing methods for language models applied to information retrieval , 2004, TOIS.

[197]  Kartik Hosanagar,et al.  Recommender systems and their impact on sales diversity , 2007, EC '07.

[198]  Torsten Suel,et al.  Modeling and predicting user behavior in sponsored search , 2009, KDD.

[199]  Ramanathan V. Guha,et al.  The predictive power of online chatter , 2005, KDD '05.

[200]  Qiang Wu,et al.  Click-through prediction for news queries , 2009, SIGIR.

[201]  Rada Mihalcea,et al.  Wikify!: linking documents to encyclopedic knowledge , 2007, CIKM '07.

[202]  Jong Wook Kim,et al.  Efficient overlap and content reuse detection in blogs and online news articles , 2009, WWW '09.

[203]  Iadh Ounis,et al.  Combining fields for query expansion and adaptive query expansion , 2007, Inf. Process. Manag..

[204]  M. de Rijke,et al.  Predicting IMDB Movie Ratings Using Social Media , 2012, ECIR.

[205]  Keejun Han,et al.  MovMe: Personalized Movie Information Retrieval , 2011 .

[206]  James Allan,et al.  An Investigation of Dirichlet Prior Smoothing's Performance Advantage , 2005 .

[207]  Roi Blanco,et al.  Ranking related news predictions , 2011, SIGIR.

[208]  Ricardo Baeza-Yates,et al.  A Multi-faceted Approach to Query Intent Classification , 2011, SPIRE.

[209]  Monika Henzinger,et al.  Finding near-duplicate web pages: a large-scale evaluation of algorithms , 2006, SIGIR.

[210]  Gilad Mishne,et al.  Applied text analytics for blogs , 2007 .

[211]  Panagiotis Takis Metaxas,et al.  How (Not) to Predict Elections , 2011, 2011 IEEE Third Int'l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int'l Conference on Social Computing.

[212]  Peter Mika,et al.  Making Sense of Twitter , 2010, SEMWEB.

[213]  James Allan,et al.  Topic detection and tracking: event-based information organization , 2002 .

[214]  Krisztian Balog,et al.  The University of Amsterdam at WePS3 , 2010, CLEF.

[215]  P.A.T. van Eck,et al.  A Compositional Semantic Structure for Multi-Agent Systems Dynamics , 2001 .

[216]  M. de Rijke,et al.  Using Coherence-Based Measures to Predict Query Difficulty , 2008, ECIR.

[217]  Tad Hogg,et al.  Using a model of social dynamics to predict popularity of news , 2010, WWW '10.

[218]  Jasmine Novak,et al.  Building enriched document representations using aggregated anchor text , 2009, SIGIR.

[219]  Leif Azzopardi,et al.  An analysis on document length retrieval trends in language modeling smoothing , 2008, Information Retrieval.

[220]  F. Both Helping People by Understanding Them: Ambient Agents Supporting Task Execution and Depression Treatment , 2012 .

[221]  Zhaoxin Zhang,et al.  Human Behavior Dynamics in Online Social Media: A Time Sequential Perspective , 2012 .

[222]  Javed A. Aslam,et al.  Relevance score normalization for metasearch , 2001, CIKM '01.

[223]  Peter Christen,et al.  Event Diffusion Patterns in Social Media , 2012, ICWSM.

[224]  Nicholas J. Belkin,et al.  Understanding Judgment of Information Quality and Cognitive Authority in the WWW , 1998 .

[225]  Iadh Ounis,et al.  Overview of the TREC 2011 Microblog Track , 2011, TREC.

[226]  W. Bruce Croft,et al.  Similarity measures for tracking information flow , 2005, CIKM '05.

[227]  Yue Liu,et al.  ICTNET at Microblog Track TREC 2012 , 2012, TREC.

[228]  Ryen W. White,et al.  Mining the search trails of surfing crowds: identifying relevant websites from user activity , 2008, WWW.

[229]  Sean P. Goggins,et al.  Shepherding and Censorship: Discourse Management in the Tea Party Patriots Facebook Group , 2012, 2012 45th Hawaii International Conference on System Sciences.

[230]  David Kauchak,et al.  Modeling word burstiness using the Dirichlet distribution , 2005, ICML.

[231]  Virgílio A. F. Almeida,et al.  Traffic Characteristics and Communication Patterns in Blogosphere , 2006, ICWSM.

[232]  Vicenç Gómez,et al.  Description and Prediction of Slashdot Activity , 2007, 2007 Latin American Web Conference (LA-WEB 2007).

[233]  Timothy W. Finin,et al.  Why we twitter: understanding microblogging usage and communities , 2007, WebKDD/SNA-KDD '07.

[234]  Josep Blat,et al.  Homogeneous Temporal Activity Patterns in a Large Online Communication Space , 2007, SAW.

[235]  Dan Klass,et al.  Podcast Solutions: The Complete Guide to Podcasting (Solutions) , 1970 .

[236]  Ryen W. White,et al.  Studying trailfinding algorithms for enhanced web search , 2010, SIGIR.

[237]  Soo Young Rieh Judgement of information quality and cognitive authority in the Web , 2002 .

[238]  Eytan Adar,et al.  Implicit Structure and the Dynamics of Blogspace , 2004 .

[239]  Monika Henzinger,et al.  Query-Free News Search , 2003, WWW '03.

[240]  W. Bruce Croft,et al.  Combining the language model and inference network approaches to retrieval , 2004, Inf. Process. Manag..

[241]  Charles Elkan,et al.  Clustering documents with an exponential-family approximation of the Dirichlet compound multinomial distribution , 2006, ICML.

[242]  Jon M. Kleinberg,et al.  Group formation in large social networks: membership, growth, and evolution , 2006, KDD '06.

[243]  Z. S. Baida,et al.  Software-aided Service Bundling : Intelligent Methods and Tools for Graphical Service Modeling , 2006 .

[244]  Jaap Gordijn,et al.  Value-based requirements engineering: exploring innovative e-commerce ideas , 2003, Requirements Engineering.

[245]  Thomas Gottron,et al.  Searching microblogs: coping with sparsity and document quality , 2011, CIKM '11.

[246]  Gilad Mishne,et al.  A Study of Blog Search , 2006, ECIR.

[247]  James D. Herbsleb,et al.  Social coding in GitHub: transparency and collaboration in an open software repository , 2012, CSCW.

[248]  Gilad Mishne,et al.  Why Are They Excited? Identifying and Explaining Spikes in Blog Mood Levels , 2006, EACL.

[249]  Qi Gao,et al.  Semantic Enrichment of Twitter Posts for User Profile Construction on the Social Web , 2011, ESWC.

[250]  Martha Larson,et al.  Term clouds as surrogates for user generated speech , 2008, SIGIR '08.

[251]  Hila Becker,et al.  Learning similarity metrics for event identification in social media , 2010, WSDM '10.

[252]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[253]  M. de Rijke,et al.  Predicting podcast preference: An analysis framework and its application , 2010, J. Assoc. Inf. Sci. Technol..

[254]  Maarten de Rijke,et al.  A Generative Blog Post Retrieval Model that Uses Query Expansion based on External Collections , 2009, ACL/IJCNLP.

[255]  Bruno Pouliquen,et al.  Multilingual and cross-lingual news topic tracking , 2004, COLING.

[256]  Katja Hofmann,et al.  Heuristic Ranking and Diversification of Web Documents , 2009, TREC.

[257]  W. Bruce Croft,et al.  Local text reuse detection , 2008, SIGIR '08.

[258]  M. de Rijke,et al.  PodCred: a framework for analyzing podcast preference , 2008, WICOW '08.

[259]  Dan Wu,et al.  Toward a Robust data fusion for document retrieval , 2008, 2008 International Conference on Natural Language Processing and Knowledge Engineering.

[260]  Agner Fog,et al.  Calculation Methods for Wallenius' Noncentral Hypergeometric Distribution , 2008, Commun. Stat. Simul. Comput..

[261]  Ophir Frieder,et al.  Fusion of effective retrieval strategies in the same information retrieval system , 2004, J. Assoc. Inf. Sci. Technol..

[262]  Maarten de Rijke,et al.  Team COMMIT at TREC 2011 , 2011, TREC.

[263]  Max Mühlhäuser,et al.  Automatically Assessing the Post Quality in Online Discussions on Software , 2007, ACL.

[264]  Katja Hofmann,et al.  The University of Amsterdam at TREC 2010: Session, Entity and Relevance Feedback , 2010, TREC.

[265]  Aditya G. Parameswaran,et al.  Blogs as Predictors of Movie Success , 2009, ICWSM.

[266]  Masataka Goto,et al.  Automatic transcription for a web 2.0 service to search podcasts , 2007, INTERSPEECH.

[267]  L. Getoor,et al.  Link-based Text Classification , 2022 .

[268]  Matthew Lease,et al.  Beyond keywords: finding information more accurately and easily using natural language , 2010 .

[269]  Alan F. Smeaton,et al.  Using NLP or NLP Resources for Information Retrieval Tasks , 1999 .

[270]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[271]  Gilad Mishne,et al.  Predicting Movie Sales from Blogger Sentiment , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.

[272]  Chrysanthos Dellarocas,et al.  The Digitization of Word-of-Mouth: Promise and Challenges of Online Feedback Mechanisms , 2003, Manag. Sci..

[273]  Òscar Celma,et al.  ZemPod: A semantic web approach to podcasting , 2008, J. Web Semant..

[274]  Gilad Mishne,et al.  YR-2007-005 FINDING HIGH-QUALITY CONTENT IN SOCIAL MEDIA WITH AN APPLICATION TO COMMUNITY-BASED QUESTION ANSWERING , 2007 .

[275]  Marco Rosa,et al.  Four degrees of separation , 2011, WebSci '12.

[276]  V. Hollink,et al.  Optimizing hierarchical menus : a usage-based approach , 2008 .

[277]  Iadh Ounis,et al.  Overview of the TREC 2008 Blog Track , 2008, TREC.

[278]  M. de Rijke,et al.  Generating links to background knowledge: a case study using narrative radiology reports , 2011, CIKM '11.

[279]  M. de Rijke,et al.  Incorporating Query Expansion and Quality Indicators in Searching Microblog Posts , 2011, ECIR.

[280]  Eli Pariser,et al.  The Filter Bubble: What the Internet Is Hiding from You , 2011 .

[281]  Henry A. Kautz,et al.  Modeling Spread of Disease from Social Interactions , 2012, ICWSM.

[282]  Marijn Koolen,et al.  The meaning of structure: the value of link evidence for information retrieval , 2011, SIGF.

[283]  Fang Wu,et al.  Social Networks that Matter: Twitter Under the Microscope , 2008, First Monday.

[284]  Stephen J. Green Lexical semantics and automatic hypertext construction , 1999, CSUR.

[285]  J.S.J.H. Penders,et al.  The practical art of moving physical objects , 1999 .

[286]  Elizabeth D. Liddy,et al.  Assessing Credibility of Weblogs , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.

[287]  Miles Efron,et al.  Information search and retrieval in microblogs , 2011, J. Assoc. Inf. Sci. Technol..

[288]  Katja Hofmann,et al.  The University of Amsterdam at WePS2 , 2009 .

[289]  Alexander Löser,et al.  Near-duplicate detection for web-forums , 2009, IDEAS '09.

[290]  B. J. Fogg,et al.  Credibility and computing technology , 1999, CACM.

[291]  Gianni Amati Information Theoretic Approach to Information Extraction , 2006, FQAS.

[292]  F. Verdenius,et al.  Methodological aspects of designing induction -based applications , 2005 .

[293]  Nick Koudas,et al.  Early online identification of attention gathering items in social media , 2010, WSDM '10.

[294]  Kenneth T. Wallenius,et al.  BIASED SAMPLING; THE NONCENTRAL HYPERGEOMETRIC PROBABILITY DISTRIBUTION , 1963 .

[295]  Thorsten Brants,et al.  Natural Language Processing in Information Retrieval , 2003, CLIN.

[296]  D. Mobach Agent-Based Mediated Service Negotiation , 2007 .

[297]  Mark Coates,et al.  Weblog Analysis for Predicting Correlations in Stock Price Evolutions , 2012, ICWSM.

[298]  Ophir Frieder,et al.  System fusion for improving performance in information retrieval systems , 2001, Proceedings International Conference on Information Technology: Coding and Computing.

[299]  Emre Kiciman,et al.  OMG, I Have to Tweet that! A Study of Factors that Influence Tweet Rates , 2012, ICWSM.

[300]  Mao Ye,et al.  From user comments to on-line conversations , 2012, KDD.

[301]  Tong Zhang,et al.  Text Mining: Predictive Methods for Analyzing Unstructured Information , 2004 .

[302]  W. Bruce Croft,et al.  Using Probabilistic Models of Document Retrieval without Relevance Information , 1979, J. Documentation.

[303]  Matthew Hurst,et al.  Event Detection and Tracking in Social Streams , 2009, ICWSM.

[304]  M. de Rijke,et al.  Exploiting Surface Features for the Prediction of Podcast Preference , 2009, ECIR.

[305]  Daniel M. Romero,et al.  Influence and passivity in social media , 2010, ECML/PKDD.

[306]  Mostafa Keikha,et al.  TEMPER: A Temporal Relevance Feedback Method , 2011, ECIR.

[307]  Aoying Zhou,et al.  Towards High-Quality Semantic Entity Detection over Online Forums , 2011, SocInfo.

[308]  Susan T. Dumais,et al.  The vocabulary problem in human-system communication , 1987, CACM.

[309]  Emre Velipasaoglu,et al.  Intent-based diversification of web search results: metrics and algorithms , 2011, Information Retrieval.

[310]  Amanda Spink,et al.  Determining the informational, navigational, and transactional intent of Web queries , 2008, Inf. Process. Manag..

[311]  Bernard J. Jansen,et al.  Twitter power: Tweets as electronic word of mouth , 2009, J. Assoc. Inf. Sci. Technol..

[312]  Monika Henzinger,et al.  Analysis of a very large web search engine query log , 1999, SIGF.

[313]  Johan F. Hoorn,et al.  Software Requirements: Update, upgrade, redesign. towards a theory of requirements change , 2006 .

[314]  Carolyn Watters,et al.  Automatic association of news items , 1997, Inf. Process. Manag..

[315]  Djoerd Hiemstra,et al.  A Linguistically Motivated Probabilistic Model of Information Retrieval , 1998, ECDL.

[316]  Hsinchun Chen,et al.  Collaborative systems: solving the vocabulary problem , 1994, Computer.

[317]  Andrei Z. Broder,et al.  On the resemblance and containment of documents , 1997, Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171).

[318]  Bonnie A. Nardi,et al.  Why we blog , 2004, CACM.

[319]  Jan Wielemaker,et al.  Logic programming for knowledge-intensive interactive applications , 2009 .

[320]  John D. Lafferty,et al.  A study of smoothing methods for language models applied to Ad Hoc information retrieval , 2001, SIGIR '01.

[321]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[322]  Ivo Swartjes Whose story is it anyway? How improv informs agency and authorship of emergent narrative , 2010 .

[323]  Luis Gravano,et al.  dSCAM: finding document copies across multiple databases , 1996, Fourth International Conference on Parallel and Distributed Information Systems.

[324]  Gabriella Kazai,et al.  Towards a science of user engagement. , 2011 .

[325]  Kwan-Liu Ma,et al.  Breaking news on twitter , 2012, CHI.

[326]  Junehwa Song,et al.  MovieCommenter: Aspect-based collaborative filtering by utilizing user comments , 2011, 7th International Conference on Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom).

[327]  Ram Akella,et al.  A new probabilistic retrieval model based on the dirichlet compound multinomial distribution , 2008, SIGIR '08.

[328]  Yong Yu,et al.  Collaborative personalized tweet recommendation , 2012, SIGIR '12.

[329]  Michael Gertz,et al.  On the value of temporal information in information retrieval , 2007, SIGF.

[330]  Ricardo Baeza-Yates,et al.  Towards a Deeper Understanding of the User’s Query Intent , 2010 .

[331]  Dragomir R. Radev,et al.  NewsInEssence: summarizing online news topics , 2005, Commun. ACM.

[332]  Peter Willett,et al.  Readings in information retrieval , 1997 .

[333]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[334]  Michael Gamon,et al.  BLEWS: Using Blogs to Provide Context for News Articles , 2008, ICWSM.

[335]  Martin Wigbertus Antonius Caminada For the sake of the Argument : explorations into argument-based reasoning , 1997 .

[336]  Ian H. Witten,et al.  Learning to link with wikipedia , 2008, CIKM '08.

[337]  Robert Burgin,et al.  Performance Standards and Evaluations in IR Test Collections: Vector-Space and Other Retrieval Models , 1997, Inf. Process. Manag..

[338]  M. Sloof,et al.  Physiology of Quality Change Modelling. Automated modelling of quality change of agricultural products , 1999 .

[339]  George Ghinea,et al.  Quality of perception: user quality of service in multimedia presentations , 2005, IEEE Transactions on Multimedia.

[340]  M. de Rijke,et al.  Predicting the volume of comments on online news stories , 2009, CIKM.

[341]  O. Vorobyev,et al.  Discrete multivariate distributions , 2008, 0811.0406.

[342]  Matthew J. Salganik,et al.  Experimental Study of Inequality and Unpredictability in an Artificial Cultural Market , 2006, Science.

[343]  Christos Faloutsos,et al.  Cascading Behavior in Large Blog Graphs , 2007 .

[344]  Gurmeet Singh Manku,et al.  Detecting near-duplicates for web crawling , 2007, WWW '07.

[345]  Ravi Kumar,et al.  Structure and evolution of blogspace , 2004, CACM.

[346]  Sietse Overbeek,et al.  Bridging Supply and Demand for Knowledge Intensive Tasks , 2008 .

[347]  Soo-Min Kim,et al.  Automatically Assessing Review Helpfulness , 2006, EMNLP.

[348]  W. John Wilbur,et al.  Retrieval Testing with Hypergeometric Document Models , 1993, J. Am. Soc. Inf. Sci..

[349]  Bart Willem Schermer,et al.  Software Agents, Surveillance and the right to privacy , 2007 .

[350]  L. J. Kortmann The resolution of visually guided behaviour , 2003 .

[351]  Laura Hollink,et al.  Semantic annotation for retrieval of visual resources , 2006 .

[352]  W. H. van Atteveldt,et al.  Semantic Network Analysis: Techniques for Extracting, Representing, and Querying Media Content , 2008 .

[353]  D. Beal The nature of minimax search , 1999 .

[354]  Massimo Melucci,et al.  An Evaluation of Automatically Constructed Hypertexts for Information Retrieval , 1999, Information Retrieval.

[355]  Edgar Meij,et al.  Combining concepts and language models for information access , 2011, SIGF.

[356]  Zhenyu Liu,et al.  Automatic identification of user goals in Web search , 2005, WWW '05.

[357]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[358]  S. Robertson The probability ranking principle in IR , 1997 .

[359]  M. Thelwall Bloggers during the London attacks: Top information sources and topics , 2006 .