Toward Tweet Entity Linking With Heterogeneous Information Networks

Twitter, a microblogging platform, has developed into an increasingly invaluable information source, where millions of users post a great quantity of tweets with various topics per day. Heterogeneous information networks consisting of multi-type objects and relations are becoming more and more prevalent as an organization form of knowledge and information. The task of linking an entity mention in a tweet with its corresponding entity in a heterogeneous information network is of great importance, for the purpose of enriching heterogeneous information networks with the abundant and fresh knowledge embedded in tweets. However, the entity mention is ambiguous. Additionally, tweets are short and informal, making it difficult to mine enough information from a single tweet for entity linking. In this paper, we propose an unsupervised iterative clustering framework TELHIN to link multiple similar tweets with a heterogeneous information network jointly. Our framework takes three dimensions of tweet similarity into consideration: (1) content similarity, (2) temporal similarity, and (3) user similarity. The appropriate weights of different similarity dimensions for each entity mention are learned iteratively based on the metric learning algorithm by leveraging the pairwise constraints generated automatically. Experiments on real data demonstrate the effectiveness of our framework in comparison with the baselines.

[1]  Yanfang Ye,et al.  Network Schema Preserving Heterogeneous Information Network Embedding , 2020, IJCAI.

[2]  Theodoros Rekatsinas,et al.  Deep Learning for Entity Matching: A Design Space Exploration , 2018, SIGMOD Conference.

[3]  Wei Shen,et al.  An Attention Factor Graph Model for Tweet Entity Linking , 2018, WWW.

[4]  Xiaojie Yuan,et al.  SHINE+: A General Framework for Domain-Specific Entity Linking with Heterogeneous Information Networks , 2018, IEEE Transactions on Knowledge and Data Engineering.

[5]  Jens Lehmann,et al.  EARL: Joint Entity and Relation Linking for Question Answering over Knowledge Graphs , 2018, SEMWEB.

[6]  Wang-Chien Lee,et al.  HIN2Vec: Explore Meta-paths in Heterogeneous Information Networks for Representation Learning , 2017, CIKM.

[7]  A. Swami,et al.  metapath2vec: Scalable Representation Learning for Heterogeneous Networks , 2017, KDD.

[8]  Ee-Peng Lim,et al.  Collective Entity Linking in Tweets Over Space and Time , 2017, ECIR.

[9]  Philip S. Yu,et al.  A Survey of Heterogeneous Information Network Analysis , 2015, IEEE Transactions on Knowledge and Data Engineering.

[10]  Philip S. Yu,et al.  Multiple Anonymized Social Networks Alignment , 2015, 2015 IEEE International Conference on Data Mining.

[11]  Philip S. Yu,et al.  COSNET: Connecting Heterogeneous Social Networks with Local and Global Consistency , 2015, KDD.

[12]  Ming-Wei Chang,et al.  Semantic Parsing via Staged Query Graph Generation: Question Answering with Knowledge Base , 2015, ACL.

[13]  Philip S. Yu,et al.  Integrated Anchor and Social Link Predictions across Social Networks , 2015, IJCAI.

[14]  Yi Yang,et al.  S-MART: Novel Tree-based Structured Learning Algorithms Applied to Tweet Entity Linking , 2015, ACL.

[15]  Phil Blunsom,et al.  Teaching Machines to Read and Comprehend , 2015, NIPS.

[16]  Kai Zheng,et al.  Microblog Entity Linking with Social Temporal Context , 2015, SIGMOD Conference.

[17]  Jiawei Han,et al.  Entity Linking with a Knowledge Base: Issues, Techniques, and Solutions , 2015, IEEE Transactions on Knowledge and Data Engineering.

[18]  Ming-Wei Chang,et al.  Entity Linking on Microblogs with Spatial and Temporal Signals , 2014, TACL.

[19]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[20]  Jiawei Han,et al.  A probabilistic model for linking named entities in web text with heterogeneous information networks , 2014, SIGMOD Conference.

[21]  Feida Zhu,et al.  HYDRA: large-scale social identity linkage via heterogeneous behavior modeling , 2014, SIGMOD Conference.

[22]  Mounia Lalmas,et al.  Penguins in sweaters, or serendipitous entity search on user-generated content , 2013, CIKM.

[23]  Philip S. Yu,et al.  Inferring anchor links across multiple heterogeneous social networks , 2013, CIKM.

[24]  Ting Liu,et al.  Microblog Entity Linking by Leveraging Extra Posts , 2013, EMNLP.

[25]  Michael Gertz,et al.  EvenTweet: Online Localized Event Detection from Twitter , 2013, Proc. VLDB Endow..

[26]  Wei Shen,et al.  Linking named entities in Tweets with knowledge base via user interest modeling , 2013, KDD.

[27]  Reza Zafarani,et al.  Connecting users across social media sites: a behavioral-modeling approach , 2013, KDD.

[28]  Yitong Li,et al.  Entity Linking for Tweets , 2013, ACL.

[29]  Anand Rajaraman,et al.  Building, maintaining, and using knowledge bases: a report from the trenches , 2013, SIGMOD '13.

[30]  Ming-Wei Chang,et al.  To Link or Not to Link? A Study on End-to-End Tweet Entity Linking , 2013, NAACL.

[31]  Fan Zhang,et al.  What's in a name?: an unsupervised approach to link users across communities , 2013, WSDM.

[32]  J. Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[33]  Philip S. Yu,et al.  Integrating meta-path selection with user-guided object clustering in heterogeneous information networks , 2012, KDD.

[34]  Gerhard Weikum,et al.  Robust Disambiguation of Named Entities in Text , 2011, EMNLP.

[35]  Jun Zhao,et al.  Collective entity linking in web text: a graph-based method , 2011, SIGIR.

[36]  Heng Ji,et al.  Knowledge Base Population: Successful Approaches and Challenges , 2011, ACL.

[37]  Xianpei Han,et al.  A Generative Entity-Mention Model for Linking Entities with Knowledge Base , 2011, ACL.

[38]  Doug Downey,et al.  Local and Global Algorithms for Disambiguation to Wikipedia , 2011, ACL.

[39]  Michael S. Bernstein,et al.  Twitinfo: aggregating and visualizing microblogs for event exploration , 2011, CHI.

[40]  Mark Dredze,et al.  Entity Disambiguation for Knowledge Base Population , 2010, COLING.

[41]  Ganesh Ramakrishnan,et al.  Collective annotation of Wikipedia entities in web text , 2009, KDD.

[42]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[43]  Kevin Chen-Chuan Chang,et al.  EntityRank: Searching Entities Directly and Holistically , 2007, VLDB.

[44]  Silviu Cucerzan,et al.  Large-Scale Named Entity Disambiguation Based on Wikipedia Data , 2007, EMNLP.

[45]  Fabian M. Suchanek,et al.  YAGO: A Core of Semantic Knowledge Unifying WordNet and Wikipedia , 2007, WWW 2007.

[46]  Razvan C. Bunescu,et al.  Using Encyclopedic Knowledge for Named entity Disambiguation , 2006, EACL.

[47]  Claire Cardie,et al.  Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .

[48]  Peter Clark,et al.  Learning Knowledge Graphs for Question Answering through Conversational Dialog , 2015, NAACL.

[49]  Ian H. Witten,et al.  An effective, low-cost measure of semantic relatedness obtained from Wikipedia links , 2008 .

[50]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.