Personalized and Adaptive Semantic Information Filtering for Social Media

Kapanipathi, Pavan. PhD., Department of Computer Science and Engineering, Wright State University, 2016. Personalized and Adaptive Semantic Information Filtering for Social Media. Short-text, and the real-time nature of social media platforms has introduced challenges such as a lack of semantic context and a dynamically changing vocabulary for personalized filtering. Semantic techniques and technologies can be leveraged to address these challenges and build novel methodologies to address the challenges to build a personalized filtering system for social media content. Social media has experienced immense growth in recent times. These platforms are becoming increasingly common for information seeking and consumption, and as part of its growing popularity, information overload pose a significant challenge to users. For instance, Twitter alone generates around 500 million tweets per day and it is impractical for users to have to parse through such an enormous stream to find information that are interesting to them. This situation necessitates efficient personalized filtering mechanisms for users to consume relevant, interesting information from social media. Building a personalized filtering system involves understanding users’ interests and utilizing these interests to deliver relevant information to users. These tasks primarily include analyzing and processing social media text which is challenging due to its shortness in length, and real-time nature of the medium. The challenges include: (1) Lack of semantic context: Social Media posts are on an average short in length, which provides limited semantic context to perform textual analysis. This is particularly detrimental for topic identification which is a necessary task for mining users’ interests; (2) Dynamically changing vocabulary: Most social media websites such as Twitter and Facebook generate posts that are of current (timely) interests to the users. Due to this real-time nature, information relevant to topics dynamically evolve reflecting the changes in the real world. This in turn changes the vocabulary associated with these dynamic topics of interest making it harder to filter relevant information; (3) Scalability: The number of users on social media platforms are significantly large, which is difficult for centralized systems to scale to deliver relevant information to users. This dissertation is devoted to exploring semantics and Semantic Web technologies to address the above mentioned challenges in building a personalized information filtering system for social media. Particularly, the necessary semantics (knowledge-bases) is derived from crowd sourced knowledge bases such as Wikipedia to improve context for understanding short-text and dynamic topics on social media.

[1]  Dinan Gunawardena,et al.  Social tags: meaning and suggestions , 2008, CIKM '08.

[2]  Ana-Maria Popescu,et al.  A Machine Learning Approach to Twitter User Classification , 2011, ICWSM.

[3]  Amit P. Sheth,et al.  Harnessing Twitter "Big Data" for Automatic Emotion Identification , 2012, 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing.

[4]  Hans Peter Luhn,et al.  A Business Intelligence System , 1958, IBM J. Res. Dev..

[5]  Abhinandan Das,et al.  Google news personalization: scalable online collaborative filtering , 2007, WWW '07.

[6]  Christian Wartena,et al.  Using Tag Co-occurrence for Recommendation , 2009, 2009 Ninth International Conference on Intelligent Systems Design and Applications.

[7]  Steffen Staab,et al.  What Is an Ontology? , 2009, Handbook on Ontologies.

[8]  Tiziano Flati,et al.  Two Is Bigger (and Better) Than One: the Wikipedia Bitaxonomy Project , 2014, ACL.

[9]  Patrick Gage Kelley,et al.  IWantPrivacy : Widespread Violation of Privacy Settings in the Twitter Social Network , 2010 .

[10]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[11]  Mor Naaman,et al.  Why we tag: motivations for annotation in mobile and online media , 2007, CHI.

[12]  Alexander J. Smola,et al.  Taxonomy discovery for personalized recommendation , 2014, WSDM.

[13]  Amit P. Sheth,et al.  User Interests Identification on Twitter Using a Hierarchical Knowledge Base , 2014, ESWC.

[14]  Matthew Michelson,et al.  Tweet Disambiguate Entities Retrieve Folksonomy SubTree Step 1 : Discover Categories Generate Topic Profile from SubTrees Step 2 : Discover Profile Topic Profile : “ English Football ” “ World Cup ” , 2010 .

[15]  Michel Dumontier,et al.  Ontology-Based Querying with Bio2RDF’s Linked Open Data , 2013, Journal of Biomedical Semantics.

[16]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[17]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[18]  John G. Breslin,et al.  Rethinking Microblogging: Open, Distributed, Semantic , 2010, ICWE.

[19]  Jaana Kekäläinen,et al.  IR evaluation methods for retrieving highly relevant documents , 2000, SIGIR Forum.

[20]  M. Sujan,et al.  Consumer Knowledge: Effects on Evaluation Strategies Mediating Consumer Judgments , 1985 .

[21]  Ah-Hwee Tan,et al.  OntoSearch: A Full-Text Search Engine for the Semantic Web , 2006, AAAI.

[22]  K. Upton,et al.  A modern approach , 1995 .

[23]  Danah Boyd,et al.  Tweet, Tweet, Retweet: Conversational Aspects of Retweeting on Twitter , 2010, 2010 43rd Hawaii International Conference on System Sciences.

[24]  Alexandre Passant,et al.  A Privacy Preference Ontology (PPO) for Linked Data , 2011, LDOW.

[25]  Jeffrey V. Nickerson,et al.  Discovering Context: Classifying Tweets through a Semantic Transform Based on Wikipedia , 2011, HCI.

[26]  James A. Hendler,et al.  The Semantic Web" in Scientific American , 2001 .

[27]  Olivier Bodenreider,et al.  Electronic Healthcare - Third International Conference, eHealth 2010, Casablanca, Morocco, December 13-15, 2010, Revised Selected Papers , 2012, eHealth.

[28]  Johan Bollen,et al.  Modeling Public Mood and Emotion: Twitter Sentiment and Socio-Economic Phenomena , 2009, ICWSM.

[29]  Geoffrey E. Hinton,et al.  Parallel Models of Associative Memory , 1989 .

[30]  Wolfgang Nejdl,et al.  Using ODP metadata to personalize search , 2005, SIGIR '05.

[31]  Yehuda Koren,et al.  Collaborative filtering with temporal dynamics , 2009, KDD.

[32]  Gerhard Weikum,et al.  YAGO: A Large Ontology from Wikipedia and WordNet , 2008, J. Web Semant..

[33]  Alexandre Passant,et al.  Twarql: tapping into the wisdom of the crowd , 2010, I-SEMANTICS '10.

[34]  Susan T. Dumais,et al.  Improving Web Search Ranking by Incorporating User Behavior Information , 2019, SIGIR Forum.

[35]  Philipp Frischmuth,et al.  Weaving a Distributed, Semantic Social Network for Mobile Users , 2011, ESWC.

[36]  Jens Lehmann,et al.  DBpedia - A crystallization point for the Web of Data , 2009, J. Web Semant..

[37]  B. Hammond Ontology , 2004, Lawrence Booth’s Book of Visions.

[38]  Danah Boyd,et al.  Facebook privacy settings: Who cares? , 2010, First Monday.

[39]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[40]  Simone Paolo Ponzetto,et al.  Deriving a Large-Scale Taxonomy from Wikipedia , 2007, AAAI.

[41]  C. Lee Giles,et al.  Discovering Relevant Scientific Literature on the Web , 2000, IEEE Intell. Syst..

[42]  Alessandro Micarelli,et al.  User Profiles for Personalized Information Access , 2007, The Adaptive Web.

[43]  Qi Gao,et al.  TUMS: Twitter-Based User Modeling Service , 2011, ESWC Workshops.

[44]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[45]  Ted Pedersen,et al.  WordNet::Similarity - Measuring the Relatedness of Concepts , 2004, NAACL.

[46]  Satoshi Sekine,et al.  A survey of named entity recognition and classification , 2007 .

[47]  C. Bauckhage,et al.  Analyzing Social Bookmarking Systems : A del . icio . us Cookbook , 2008 .

[48]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[49]  Clement T. Yu,et al.  Personalized web search by mapping user queries to categories , 2002, CIKM '02.

[50]  Virgílio A. F. Almeida,et al.  Characterizing user behavior in online social networks , 2009, IMC '09.

[51]  Ji-Rong Wen,et al.  WWW 2007 / Track: Search Session: Personalization A Largescale Evaluation and Analysis of Personalized Search Strategies ABSTRACT , 2022 .

[52]  Christian Bizer,et al.  DBpedia spotlight: shedding light on the web of documents , 2011, I-Semantics '11.

[53]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[54]  Young-Woo Seo,et al.  Personalized web-document filtering using reinforcement learning , 2001, Appl. Artif. Intell..

[55]  Ciro Cattuto,et al.  Semantic Grounding of Tag Relatedness in Social Bookmarking Systems , 2008, SEMWEB.

[56]  Amit P. Sheth,et al.  Knowledge Enabled Approach to Predict the Location of Twitter Users , 2015, ESWC.

[57]  Hsinchun Chen,et al.  A Comparison of Collaborative-Filtering Recommendation Algorithms for E-commerce , 2007, IEEE Intelligent Systems.

[58]  George Kingsley Zipf,et al.  Human behavior and the principle of least effort , 1949 .

[59]  Yong Yu,et al.  Collaborative personalized tweet recommendation , 2012, SIGIR '12.

[60]  Qi Gao,et al.  Analyzing user modeling on twitter for personalized news recommendations , 2011, UMAP'11.

[61]  Eva Zangerle,et al.  Recommending #-Tags in Twitter , 2011 .

[62]  Barry Smyth,et al.  Using twitter to recommend real-time topical news , 2009, RecSys '09.

[63]  M. Osborne,et al.  Bieber no more : First Story Detection using Twitter and Wikipedia , 2012 .

[64]  Amit P. Sheth,et al.  Extracting Diverse Sentiment Expressions with Target-Dependent Polarity from Twitter , 2012, ICWSM.

[65]  Tim Berners-Lee,et al.  Information Management: A Proposal , 1990 .

[66]  Georg Lausen,et al.  Spreading activation models for trust propagation , 2004, IEEE International Conference on e-Technology, e-Commerce and e-Service, 2004. EEE '04. 2004.

[67]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[68]  J. Sarasohn-Kahn The Wisdom of Patients: Health Care Meets Online Social Media , 2008 .

[69]  Philip K. Chan,et al.  Learning implicit user interest hierarchy for context in personalization , 2008, IUI '03.

[70]  Feng Qiu,et al.  Automatic identification of user interest for personalized search , 2006, WWW '06.

[71]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[72]  John G. Breslin,et al.  SIOC: Content Exchange and Semantic Interoperability Between Social Networks , 2009 .

[73]  Dan Brickley,et al.  FOAF Vocabulary Specification , 2004 .

[74]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994, Structural analysis in the social sciences.

[75]  H. Chertkow,et al.  Semantic memory , 2002, Current neurology and neuroscience reports.

[76]  Amit P. Sheth,et al.  Ontology Alignment for Linked Open Data , 2010, SEMWEB.

[77]  John S. Brownstein,et al.  Wikipedia Usage Estimates Prevalence of Influenza-Like Illness in the United States in Near Real-Time , 2014, PLoS Comput. Biol..

[78]  Dan Brickley,et al.  Rdf vocabulary description language 1.0 : Rdf schema , 2004 .

[79]  Alfred Kobsa,et al.  Generic User Modeling Systems , 2001, User modeling and user-adapted interaction.

[80]  Geert-Jan Houben,et al.  Semantics + filtering + search = twitcident. exploring information in social web streams , 2012, HT '12.

[81]  Jon Kleinberg,et al.  Differences in the mechanics of information diffusion across topics: idioms, political hashtags, and complex contagion on twitter , 2011, WWW.

[82]  Susan T. Dumais,et al.  Personalizing search via automated analysis of interests and activities , 2005, SIGIR '05.

[83]  Roberto Turrin,et al.  Performance of recommender algorithms on top-n recommendation tasks , 2010, RecSys '10.

[84]  Pablo N. Mendes,et al.  Twitris 2.0 : Semantically Empowered System for Understanding Perceptions From Social Data , 2010 .

[85]  Peter Ingwersen,et al.  Cognitive Perspectives of Information Retrieval Interaction: Elements of a Cognitive IR Theory , 1996, J. Documentation.

[86]  Roelof van Zwol,et al.  Flickr tag recommendation based on collective knowledge , 2008, WWW.

[87]  Chris Clifton,et al.  TopCat: data mining for topic identification in a text corpus , 1999, IEEE Transactions on Knowledge and Data Engineering.

[88]  Geert-Jan Houben,et al.  What Makes a Tweet Relevant for a Topic? , 2012, #MSM.

[89]  Adam Acar,et al.  Twitter for crisis communication: lessons learned from Japan's tsunami disaster , 2011, Int. J. Web Based Communities.

[90]  Evgeniy Gabrilovich,et al.  Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[91]  Mirella Lapata,et al.  Tweet Recommendation with Graph Co-Ranking , 2012, ACL.

[92]  Matthew Rowe,et al.  Aligning Tweets with Events : Automation via Semantics , 2011 .

[93]  Roy T. Fielding,et al.  Uniform Resource Identifiers (URI): Generic Syntax , 1998, RFC.

[94]  Harry Shum,et al.  An Empirical Study on Learning to Rank of Tweets , 2010, COLING.

[95]  Ke Wang,et al.  Privacy-enhancing personalized web search , 2007, WWW '07.

[96]  John G. Breslin,et al.  Aggregated, interoperable and multi-domain user profiles for the social web , 2012, I-SEMANTICS '12.

[97]  Matthieu Latapy,et al.  Main-memory triangle computations for very large (sparse (power-law)) graphs , 2008, Theor. Comput. Sci..

[98]  Ramanathan V. Guha,et al.  Semantic search , 2003, WWW '03.

[99]  Fabio Crestani,et al.  Application of Spreading Activation Techniques in Information Retrieval , 1997, Artificial Intelligence Review.

[100]  Amit P. Sheth,et al.  Growing Fields of Interest - Using an Expand and Reduce Strategy for Domain Model Extraction , 2008, 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[101]  Somnath Banerjee,et al.  Clustering short texts using wikipedia , 2007, SIGIR.

[102]  Susan T. Dumais,et al.  Characterizing Microblogs with Topic Models , 2010, ICWSM.

[103]  Timothy W. Finin,et al.  Why we twitter: understanding microblogging usage and communities , 2007, WebKDD/SNA-KDD '07.

[104]  Nan Sun,et al.  Exploiting internal and external semantics for the clustering of short texts using world knowledge , 2009, CIKM.

[105]  Chin-Yew Lin Knowledge-Based Automatic Topic Identification , 1995, ACL.

[106]  Dan S. Wallach,et al.  Birds of a FETHR: open, decentralized micropublishing , 2009, IPTPS.

[107]  Masatoshi Yoshikawa,et al.  Adaptive web search based on user profile constructed without any effort from users , 2004, WWW '04.

[108]  Georg Lausen,et al.  Automatic computation of semantic proximity using taxonomic knowledge , 2006, CIKM '06.

[109]  M. de Rijke,et al.  Adding semantics to microblog posts , 2012, WSDM '12.

[110]  Tim Berners-Lee,et al.  Linked data on the web (LDOW2008) , 2008, WWW.

[111]  Mark J. Warshawsky,et al.  A Modern Approach , 2005 .

[112]  L. Palen Online Social Media in Crisis Events. , 2008 .

[113]  L. Stein,et al.  OWL Web Ontology Language - Reference , 2004 .

[114]  Yang Chen,et al.  Twittering by cuckoo: decentralized and socio-aware online microblogging services , 2010, SIGCOMM 2010.

[115]  John Hannon,et al.  Recommending twitter users to follow using content and collaborative filtering approaches , 2010, RecSys '10.

[116]  Giovanni Comarela,et al.  Analyzing the Dynamic Evolution of Hashtags on Twitter: a Language-Based Approach , 2011 .

[117]  Brian D. Davison,et al.  Empirical study of topic modeling in Twitter , 2010, SOMA '10.

[118]  Bamshad Mobasher,et al.  Web search personalization with ontological user profiles , 2007, CIKM '07.

[119]  Komal Kapoor,et al.  Creating User Profiles Using Wikipedia , 2009, ER.

[120]  Lars Schmidt-Thieme,et al.  Taxonomy-driven computation of product recommendations , 2004, CIKM '04.

[121]  Wayne D. Gray,et al.  Basic objects in natural categories , 1976, Cognitive Psychology.

[122]  Tim O'Reilly,et al.  What is Web 2.0: Design Patterns and Business Models for the Next Generation of Software , 2007 .

[123]  Peter Mika Ontologies Are Us: A Unified Model of Social Networks and Semantics , 2005, International Semantic Web Conference.

[124]  Analía Amandi,et al.  Modeling user interests by conceptual clustering , 2006, Inf. Syst..

[125]  A. Kaplan,et al.  Users of the world, unite! The challenges and opportunities of Social Media , 2010 .

[126]  Hakan Ferhatosmanoglu,et al.  Short text classification in twitter to improve information filtering , 2010, SIGIR.

[127]  Oren Etzioni,et al.  Named Entity Recognition in Tweets: An Experimental Study , 2011, EMNLP.

[128]  Kalina Bontcheva,et al.  Microblog-genre noise and impact on semantic annotation accuracy , 2013, HT.

[129]  Peretz Shoval,et al.  Information Filtering: Overview of Issues, Research and Systems , 2001, User Modeling and User-Adapted Interaction.

[130]  Thomas R. Gruber,et al.  Collective knowledge systems: Where the Social Web meets the Semantic Web , 2008, J. Web Semant..

[131]  Michela Ferron,et al.  Collective memory building in Wikipedia: the case of North African uprisings , 2011, Int. Sym. Wikis.

[132]  Henry Story,et al.  FOAF+TLS: RESTful Authentication for the Social Web , 2009, SPOT@ESWC.

[133]  Nicholas J. Belkin,et al.  Information filtering and information retrieval: two sides of the same coin? , 1992, CACM.