TAGME: on-the-fly annotation of short text fragments (by wikipedia entities)

We designed and implemented TAGME, a system that is able to efficiently and judiciously augment a plain-text with pertinent hyperlinks to Wikipedia pages. The specialty of TAGME with respect to known systems [5,8] is that it may annotate texts which are short and poorly composed, such as snippets of search-engine results, tweets, news, etc.. This annotation is extremely informative, so any task that is currently addressed using the bag-of-words paradigm could benefit from using this annotation to draw upon (the millions of) Wikipedia pages and their inter-relations.

[1]  Nan Sun,et al.  Exploiting internal and external semantics for the clustering of short texts using world knowledge , 2009, CIKM.

[2]  Evgeniy Gabrilovich,et al.  Feature Generation for Text Categorization Using World Knowledge , 2005, IJCAI.

[3]  Mehran Sahami,et al.  A web-based kernel function for measuring the similarity of short text snippets , 2006, WWW '06.

[4]  Rada Mihalcea,et al.  Wikify!: linking documents to encyclopedic knowledge , 2007, CIKM '07.

[5]  Dawid Weiss,et al.  A survey of Web clustering engines , 2009, CSUR.

[6]  Paolo Ferragina,et al.  A personalized search engine based on Web‐snippet hierarchical clustering , 2005, WWW '05.

[7]  Silviu Cucerzan,et al.  Large-Scale Named Entity Disambiguation Based on Wikipedia Data , 2007, EMNLP.

[8]  Evgeniy Gabrilovich,et al.  Wikipedia-based Semantic Interpretation for Natural Language Processing , 2014, J. Artif. Intell. Res..

[9]  Ian H. Witten,et al.  An effective, low-cost measure of semantic relatedness obtained from Wikipedia links , 2008 .

[10]  Ganesh Ramakrishnan,et al.  Collective annotation of Wikipedia entities in web text , 2009, KDD.

[11]  Chaomei Chen,et al.  Mining the Web: Discovering knowledge from hypertext data , 2004, J. Assoc. Inf. Sci. Technol..

[12]  Somnath Banerjee,et al.  Clustering short texts using wikipedia , 2007, SIGIR.

[13]  Stanislaw Osinski Improving Quality of Search Results Clustering with Approximate Matrix Factorisations , 2006, ECIR.

[14]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[15]  Ian H. Witten,et al.  Mining Meaning from Wikipedia , 2008, Int. J. Hum. Comput. Stud..

[16]  Hua Li,et al.  Enhancing text clustering by leveraging Wikipedia semantics , 2008, SIGIR '08.

[17]  Ramanathan V. Guha,et al.  TAP: A Semantic Web Test-bed , 2003, J. Web Semant..

[18]  Yin Yang,et al.  Query by document , 2009, WSDM '09.

[19]  Lyle H. Ungar,et al.  Web-scale named entity recognition , 2008, CIKM '08.

[20]  Giuseppe Attardi,et al.  Semantically Annotated Snapshot of the English Wikipedia , 2008, LREC.

[21]  Ian H. Witten,et al.  Learning to link with wikipedia , 2008, CIKM '08.

[22]  Daniel S. Weld,et al.  Autonomously semantifying wikipedia , 2007, CIKM '07.