Leveraging Recommender Systems for the Creation and Maintenance of Structure within Collaborative Social Media Platforms

During the last decade, the web transformed from a web of information consumers to a web of information producers. In particular, the advent of online social media platforms is hugely responsible for this shift as people now actively post information in knowledge bases, engage in online communities and contribute to social media platforms. Hence, a vast amount of new information is produced each day. This publicly available data is an invaluable source of information which still is to be fully exploited. Due to the broad span of users of such systems (originating from different cultures and backgrounds, speaking different languages, etc.), the information provided features a limited amount of common structure, as e.g., objects are named differently and information is structured differently. This is a severe constraint in regards to the performance of search facilities. This thesis proposes to facilitate recommender systems to create and maintain a common structure within collaborative social media platforms aiming at improving search performance. For this purpose, two different recommender systems for two showcase platforms are presented. The first recommender system provides recommendations for structuring information within a semistructured information system whereas the second recommender systems is a hashtag recommender system for microblogging services. Zusammenfassung Wahrend des letzten Jahrzehnts hat sich das Web von einem Netz von Informationskonsumenten zu einem Netz von Informationsproduzenten gewandelt. Insbesondere die zunehmende Verbreitung von Social Media Plattformen hat grosen Anteil an dieser Entwicklung. Menschen erstellen nun aktiv Beitrage in Wissensbasen, beteiligen sich in online Communities und wirken bei Social Media Plattformen mit. So werden taglich sehr grose Datenmengen erzeugt, die fur die Offentlichkeit zuganglich sind. Allerdings werden diese wertvollen Datenmengen noch nicht vollstandig genutzt. Aufgrund der grosen Diversitat der Benutzer (verschiedene Kulturen, Hintergrunde, verschiedene Sprachen, etc.), weisen die verfugbaren Informationen nur beschrankt eine gemeinsame Struktur auf, so sind beispielsweise gleiche Objekte oft verschieden benannt oder Information ist verschieden strukturiert. Dies wirkt sich negativ auf die Performance der Suche auf diesen Daten aus. Diese Dissertation beschaftigt sich mit der Fragestellung, wie Recommender Systems (dt. Empfehlungssysteme) verwendet werden konnen, um eine gemeinsame Struktur in kollaborativen Social Media Plattformen erstellen und pflegen zu konnen. Das Ziel dabei ist, die Such-Performance auf diesen Daten zu verbessern. Dazu werden zwei exemplarische Empfehlungssysteme fur zwei solchen Plattformen prasentiert. Das erste Empfehlungssystem stellt Empfehlung fur die Strukturierung von Information in semistrukturierten Informationssystemen zur Verfugung. Das zweite Empfehlungssystem ist ein Hashtag-Empfehlungssystem fur Microblogging Plattformen.

[1]  George Karypis,et al.  Item-based top-N recommendation algorithms , 2004, TOIS.

[2]  Alistair Moffat,et al.  Exploring the similarity space , 1998, SIGF.

[3]  Hirokazu Kato,et al.  Reasonable tag-based collaborative filtering for social tagging systems , 2008, WICOW '08.

[4]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[5]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[6]  Eni Mustafaraj,et al.  From Obscurity to Prominence in Minutes: Political Speech and Real-Time Search , 2010 .

[7]  Susan T. Dumais,et al.  The vocabulary problem in human-system communication , 1987, CACM.

[8]  Isabell M. Welpe,et al.  Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment , 2010, ICWSM.

[9]  Hakan Ferhatosmanoglu,et al.  Short text classification in twitter to improve information filtering , 2010, SIGIR.

[10]  Stephen E. Robertson,et al.  GatfordCentre for Interactive Systems ResearchDepartment of Information , 1996 .

[11]  Calton Pu,et al.  Study of Trend-Stuffing on Twitter through Text Classification , 2010 .

[12]  Patrick Paroubek,et al.  Twitter as a Corpus for Sentiment Analysis and Opinion Mining , 2010, LREC.

[13]  Thomas L. Griffiths,et al.  Probabilistic Topic Models , 2007 .

[14]  Michael J. Pazzani,et al.  Content-Based Recommendation Systems , 2007, The Adaptive Web.

[15]  Johanna D. Moore,et al.  Twitter Sentiment Analysis: The Good the Bad and the OMG! , 2011, ICWSM.

[16]  John Hannon,et al.  Recommending twitter users to follow using content and collaborative filtering approaches , 2010, RecSys '10.

[17]  Bamshad Mobasher Recommender Systems , 2007, Künstliche Intell..

[18]  Giovanni Comarela,et al.  Analyzing the Dynamic Evolution of Hashtags on Twitter: a Language-Based Approach , 2011 .

[19]  G. A. Miller THE PSYCHOLOGICAL REVIEW THE MAGICAL NUMBER SEVEN, PLUS OR MINUS TWO: SOME LIMITS ON OUR CAPACITY FOR PROCESSING INFORMATION 1 , 1956 .

[20]  Brian D. Davison,et al.  Empirical study of topic modeling in Twitter , 2010, SOMA '10.

[21]  Gediminas Adomavicius,et al.  Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions , 2005, IEEE Transactions on Knowledge and Data Engineering.

[22]  Eva Zangerle,et al.  Using Tag Recommendations to Homogenize Folksonomies in Microblogging Environments , 2011, SocInfo.

[23]  Dan Brickley,et al.  Resource Description Framework (RDF) Model and Syntax Specification , 2002 .

[24]  Balachander Krishnamurthy,et al.  A few chirps about twitter , 2008, WOSN '08.

[25]  Ciro Cattuto,et al.  Dynamical classes of collective attention in twitter , 2011, WWW.

[26]  Jonathan L. Herlocker,et al.  Evaluating collaborative filtering recommender systems , 2004, TOIS.

[27]  Qi He,et al.  TwitterRank: finding topic-sensitive influential twitterers , 2010, WSDM '10.

[28]  Miles Efron,et al.  Information search and retrieval in microblogs , 2011, J. Assoc. Inf. Sci. Technol..

[29]  Eva Zangerle,et al.  Recommendation-Based Evolvement of Dynamic Schemata in Semistructured Information Systems , 2010, Grundlagen von Datenbanken.

[30]  Alice Oh,et al.  Analysis of Twitter Lists as a Potential Source for Discovering Latent Characteristics of Users , 2010 .

[31]  Susan T. Dumais,et al.  Characterizing Microblogs with Topic Models , 2010, ICWSM.

[32]  Fred J. Damerau,et al.  A technique for computer detection and correction of spelling errors , 1964, CACM.

[33]  Daniel S. Weld,et al.  Autonomously semantifying wikipedia , 2007, CIKM '07.

[34]  Wouter Weerkamp,et al.  Twitter hashtags: Joint Translation and Clustering , 2011 .

[35]  Jon Kleinberg,et al.  Differences in the mechanics of information diffusion across topics: idioms, political hashtags, and complex contagion on twitter , 2011, WWW.

[36]  Guy Shani,et al.  A Survey of Accuracy Evaluation Metrics of Recommendation Tasks , 2009, J. Mach. Learn. Res..

[37]  Nigel Shadbolt,et al.  Resource Description Framework (RDF) , 2009 .

[38]  Mor Naaman,et al.  Is it really about me?: message content in social awareness streams , 2010, CSCW '10.

[39]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[40]  Greg Linden,et al.  Amazon . com Recommendations Item-to-Item Collaborative Filtering , 2001 .

[41]  David Heckerman,et al.  Empirical Analysis of Predictive Algorithms for Collaborative Filtering , 1998, UAI.

[42]  Bernardo A. Huberman,et al.  Trends in Social Media: Persistence and Decay , 2011, ICWSM.

[43]  Markus Schedl,et al.  #nowplaying Madonna: a large-scale evaluation on estimating similarities between music artists and between movies from microblogs , 2012, Information Retrieval.

[44]  Leysia Palen,et al.  Microblogging during two natural hazards events: what twitter may contribute to situational awareness , 2010, CHI.

[45]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[46]  Pattie Maes,et al.  Social information filtering: algorithms for automating “word of mouth” , 1995, CHI '95.

[47]  Jennifer Widom,et al.  Exploiting hierarchical domain structure to compute similarity , 2003, TOIS.

[48]  Miles Efron,et al.  Hashtag retrieval in a microblogging environment , 2010, SIGIR.

[49]  Dawn Xiaodong Song,et al.  Suspended accounts in retrospect: an analysis of twitter spam , 2011, IMC '11.

[50]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[51]  Alexander Richter,et al.  Tweet Inside: Microblogging in a Corporate Context , 2010, Bled eConference.

[52]  Gerhard Friedrich,et al.  Recommender Systems - An Introduction , 2010 .

[53]  Haewoon Kwak,et al.  "Novel aspects coming from the directionality of online relationships: a case study of Twitter" by Haewoon Kwak, Changhyun Lee, Hosung Park, Hyunwoo Chun and Sue Moon with Ching-man Au Yeung as coordinator , 2011, SIGWEB Newsl..

[54]  M. McPherson,et al.  Birds of a Feather: Homophily in Social Networks , 2001 .

[55]  Lei Yang,et al.  We know what @you #tag: does the dual role affect hashtag adoption? , 2012, WWW.

[56]  Michael J. Pazzani,et al.  A Framework for Collaborative, Content-Based and Demographic Filtering , 1999, Artificial Intelligence Review.

[57]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[58]  Fang Wu,et al.  Social Networks that Matter: Twitter Under the Microscope , 2008, First Monday.

[59]  Peter Mika,et al.  Making Sense of Twitter , 2010, SEMWEB.

[60]  Evangelos E. Milios,et al.  Learning in efficient tag recommendation , 2010, RecSys '10.

[61]  M. Matteucci,et al.  An Evaluation Methodology for Collaborative Recommender Systems , 2008, 2008 International Conference on Automated Solutions for Cross Media Content and Multi-Channel Distribution.

[62]  Barry Smyth,et al.  Towards tagging and categorization for micro-blogs , 2010, AAAI 2010.

[63]  Günther Specht,et al.  Information filtering and personalisation in databases using Gaussian curves , 2000, Proceedings 2000 International Database Engineering and Applications Symposium (Cat. No.PR00789).

[64]  Michael S. Bernstein,et al.  Short and tweet: experiments on recommending content from information streams , 2010, CHI.

[65]  Qi Gao,et al.  Analyzing user modeling on twitter for personalized news recommendations , 2011, UMAP'11.

[66]  Eva Zangerle,et al.  Recommending #-Tags in Twitter , 2011 .

[67]  Barry Smyth,et al.  Using twitter to recommend real-time topical news , 2009, RecSys '09.

[68]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[69]  Cyril W. Cleverdon,et al.  Factors determining the performance of indexing systems , 1966 .

[70]  Eva Zangerle,et al.  Recommending structure in collaborative semistructured information systems , 2010, RecSys '10.

[71]  Ee-Peng Lim,et al.  On Recommending Hashtags in Twitter Networks , 2012, SocInfo.

[72]  John Riedl,et al.  Recommender systems in e-commerce , 1999, EC '99.

[73]  Mor Naaman,et al.  Why we tag: motivations for annotation in mobile and online media , 2007, CHI.

[74]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[75]  Kirill Kireyev Applications of Topics Models to Analysis of Disaster-Related Twitter Data , 2009 .

[76]  Markus Krötzsch,et al.  Semantic Wikipedia , 2006, WikiSym '06.

[77]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[78]  James Fogarty,et al.  Intelligence in Wikipedia , 2008, AAAI.

[79]  Martin Hepp HyperTwitter: Collaborative Knowledge Engineering via Twitter Messages , 2010, EKAW.

[80]  Eva Zangerle,et al.  The Snoopy Concept: Fighting heterogeneity in semistructured and collaborative information systems by using recommendations , 2011, 2011 International Conference on Collaboration Technologies and Systems (CTS).

[81]  Timothy W. Finin,et al.  Why we twitter: understanding microblogging usage and communities , 2007, WebKDD/SNA-KDD '07.

[82]  Mehran Sahami,et al.  Evaluating similarity measures: a large-scale study in the orkut social network , 2005, KDD '05.

[83]  John Riedl,et al.  Item-based collaborative filtering recommendation algorithms , 2001, WWW '01.

[84]  Eva Zangerle,et al.  Dealing with Structure Heterogeneity in Semantic Collaborative Information Systems , 2012, Collaboration and the Semantic Web.

[85]  Robin D. Burke,et al.  Hybrid Recommender Systems: Survey and Experiments , 2002, User Modeling and User-Adapted Interaction.

[86]  Matthew Rowe,et al.  Mapping tweets to conference talks: a goldmine for semantics , 2010 .

[87]  Susan C. Herring,et al.  Beyond Microblogging: Conversation and Collaboration via Twitter , 2009, 2009 42nd Hawaii International Conference on System Sciences.

[88]  Liza Potts,et al.  Tweeting disaster: hashtag constructions and collisions , 2011, SIGDOC '11.

[89]  John G. Breslin,et al.  Improving Categorisation in Social Media Using Hyperlinks to Structured Data Sources , 2011, ESWC.

[90]  Mark P. Graus,et al.  Understanding choice overload in recommender systems , 2010, RecSys '10.

[91]  Andreas Hotho,et al.  Tag Recommendations in Folksonomies , 2007, LWA.

[92]  Danah Boyd,et al.  Tweet, Tweet, Retweet: Conversational Aspects of Retweeting on Twitter , 2010, 2010 43rd Hawaii International Conference on System Sciences.

[93]  Bamshad Mobasher,et al.  Adapting K-Nearest Neighbor for Tag Recommendation in Folksonomies , 2009, ITWP.

[94]  Johan Bollen,et al.  Twitter mood predicts the stock market , 2010, J. Comput. Sci..

[95]  John G. Breslin,et al.  Understanding how Twitter is used to spread scientific messages , 2010 .

[96]  Douglas B. Terry,et al.  Using collaborative filtering to weave an information tapestry , 1992, CACM.

[97]  Gerhard Weikum,et al.  YAGO2: exploring and querying world knowledge in time, space, context, and many languages , 2011, WWW.

[98]  Loren G. Terveen,et al.  Using frequency-of-mention in public conversations for social filtering , 1996, CSCW '96.

[99]  Susan T. Dumais,et al.  Similarity Measures for Short Segments of Text , 2007, ECIR.

[100]  Eva Zangerle,et al.  SnoopyDB: narrowing the gap between structured and unstructured information using recommendations , 2010, HT '10.

[101]  Raghu Ramakrishnan,et al.  Building Structured Web Community Portals: A Top-Down, Compositional, and Incremental Approach , 2007, VLDB.

[102]  Taghi M. Khoshgoftaar,et al.  A Survey of Collaborative Filtering Techniques , 2009, Adv. Artif. Intell..

[103]  Peter Buneman,et al.  Semistructured data , 1997, PODS.

[104]  James Fogarty,et al.  Amplifying community content creation with mixed initiative information extraction , 2009, CHI.

[105]  Meredith Ringel Morris,et al.  #TwitterSearch: a comparison of microblog search and web search , 2011, WSDM '11.

[106]  John Riedl,et al.  Tagommenders: connecting users to items through tags , 2009, WWW '09.

[107]  Ilknur Celik,et al.  Leveraging the Semantics of Tweets for Adaptive Faceted Search on Twitter , 2011, SEMWEB.

[108]  Peyman Nasirifard,et al.  Tadvise: A Twitter Assistant Based on Twitter Lists , 2011, SocInfo.

[109]  Ari Rappoport,et al.  What's in a hashtag?: content based prediction of the spread of ideas in microblogging communities , 2012, WSDM '12.

[110]  E. David,et al.  Networks, Crowds, and Markets: Reasoning about a Highly Connected World , 2010 .

[111]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[112]  Efthimis N. Efthimiadis,et al.  Conversational tagging in twitter , 2010, HT '10.

[113]  Analía Amandi,et al.  Recommending Information Sources to Information Seekers in Twitter , 2011 .

[114]  Bradley N. Miller,et al.  GroupLens: applying collaborative filtering to Usenet news , 1997, CACM.

[115]  Félix Hernández-del-Olmo,et al.  Evaluation of recommender systems: A new approach , 2008, Expert Syst. Appl..

[116]  A. Bruns,et al.  The use of Twitter hashtags in the formation of ad hoc publics , 2011 .

[117]  Roelof van Zwol,et al.  Flickr tag recommendation based on collective knowledge , 2008, WWW.

[118]  Yoav Shoham,et al.  Fab: content-based, collaborative recommendation , 1997, CACM.

[119]  John G. Breslin,et al.  Microblogging: A Semantic Web and Distributed Approach , 2008 .