Knowledge-based graph document modeling

We propose a graph-based semantic model for representing document content. Our method relies on the use of a semantic network, namely the DBpedia knowledge base, for acquiring fine-grained information about entities and their semantic relations, thus resulting in a knowledge-rich document model. We demonstrate the benefits of these semantic representations in two tasks: entity ranking and computing document semantic similarity. To this end, we couple DBpedia's structure with an information-theoretic measure of concept association, based on its explicit semantic relations, and compute semantic similarity using a Graph Edit Distance based measure, which finds the optimal matching between the documents' entities using the Hungarian method. Experimental results show that our general model outperforms baselines built on top of traditional methods, and achieves a performance close to that of highly specialized methods that have been tuned to these specific tasks.

[1]  Derek Greene,et al.  Unsupervised graph-based topic labelling using dbpedia , 2013, WSDM.

[2]  Gerhard Weikum,et al.  KORE: keyphrase overlap relatedness for entity disambiguation , 2012, CIKM.

[3]  Simone Paolo Ponzetto,et al.  Collaboratively built semi-structured content and Artificial Intelligence: The story so far , 2013, Artif. Intell..

[4]  Gerhard Weikum,et al.  Natural Language Questions for the Web of Data , 2012, EMNLP.

[5]  Gerhard Weikum,et al.  Robust Disambiguation of Named Entities in Text , 2011, EMNLP.

[6]  Axel-Cyrille Ngonga Ngomo,et al.  Extracting Multilingual Natural-Language Patterns for RDF Predicates , 2012, EKAW.

[7]  Ziqi Zhang,et al.  Recent advances in methods of lexical semantic relatedness – a survey , 2012, Natural Language Engineering.

[8]  Jennifer Chu-Carroll,et al.  Building Watson: An Overview of the DeepQA Project , 2010, AI Mag..

[9]  Simone Paolo Ponzetto,et al.  BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network , 2012, Artif. Intell..

[10]  Massimiliano Ciaramita,et al.  A framework for benchmarking entity-annotation systems , 2013, WWW.

[11]  Estevam R. Hruschka,et al.  Toward an Architecture for Never-Ending Language Learning , 2010, AAAI.

[12]  Evgeniy Gabrilovich,et al.  Concept-Based Information Retrieval Using Explicit Semantic Analysis , 2011, TOIS.

[13]  Vivi Nastase,et al.  Topic-Driven Multi-Document Summarization with Encyclopedic Knowledge and Spreading Activation , 2008, EMNLP.

[14]  Marcia Lei Zeng,et al.  Knowledge Organization Systems (KOS) , 2008 .

[15]  Carlotta Domeniconi,et al.  Building semantic kernels for text classification using wikipedia , 2008, KDD.

[16]  Heng Ji,et al.  Knowledge Base Population: Successful Approaches and Challenges , 2011, ACL.

[17]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[18]  Kai Eckert,et al.  Usage-driven maintenance of knowledge organization systems , 2012 .

[19]  Iryna Gurevych,et al.  UKP: Computing Semantic Textual Similarity by Combining Multiple Content Similarity Measures , 2012, *SEMEVAL.

[20]  Eneko Agirre,et al.  *SEM 2013 shared task: Semantic Textual Similarity , 2013, *SEMEVAL.

[21]  Evgeniy Gabrilovich,et al.  Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[22]  Xuelong Li,et al.  A survey of graph edit distance , 2010, Pattern Analysis and Applications.

[23]  Iryna Gurevych,et al.  UBY - A Large-Scale Unified Lexical-Semantic Resource Based on LMF , 2012, EACL.

[24]  Bert R. Boyce,et al.  Vocabulary control for information retrieval , 1987, J. Am. Soc. Inf. Sci..

[25]  Simone Paolo Ponzetto,et al.  Knowledge Derived From Wikipedia For Computing Semantic Relatedness , 2007, J. Artif. Intell. Res..

[26]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[27]  Gianluca Demartini,et al.  Overview of the INEX 2009 Entity Ranking Track , 2009, INEX.

[28]  Paolo Ferragina,et al.  Fast and Accurate Annotation of Short Texts with Wikipedia Pages , 2010, IEEE Software.

[29]  Christian Bizer,et al.  DBpedia spotlight: shedding light on the web of documents , 2011, I-Semantics '11.

[30]  Nicu Sebe,et al.  Distributional semantics with eyes: using image analysis to improve computational representations of word meaning , 2012, ACM Multimedia.

[31]  Eneko Agirre,et al.  Two birds with one stone: learning semantic models for text categorization and word sense disambiguation , 2011, CIKM '11.

[32]  Gerhard Weikum,et al.  YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia: Extended Abstract , 2013, IJCAI.

[33]  Jan Snajder,et al.  Recognizing Identical Events with Graph Kernels , 2013, ACL.

[34]  Michael D. Lee,et al.  An Empirical Evaluation of Models of Text Document Similarity , 2005 .

[35]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[36]  Oren Etzioni,et al.  Open Information Extraction from the Web , 2007, CACM.

[37]  Jens Lehmann,et al.  DBpedia - A crystallization point for the Web of Data , 2009, J. Web Semant..

[38]  Alexandre Passant,et al.  Measuring Semantic Distance on Linking Data and Using it for Resources Recommendations , 2010, AAAI Spring Symposium: Linked Data Meets Artificial Intelligence.

[39]  Iryna Gurevych,et al.  A Reflective View on Text Similarity , 2011, RANLP.

[40]  Andrea Marino,et al.  Topical clustering of search results , 2012, WSDM '12.

[41]  Rada Mihalcea,et al.  Semantic Relatedness Using Salient Semantic Analysis , 2011, AAAI.

[42]  Wei Shen,et al.  LINDEN: linking named entities with knowledge base via semantic knowledge , 2012, WWW.

[43]  Slim Abdennadher,et al.  Collecting Links between Entities Ranked by Human Association Strengths , 2013, ESWC.

[44]  Barbara Di Eugenio,et al.  Query Sentences as Semantic (Sub) Networks , 2009, 2009 IEEE International Conference on Semantic Computing.

[45]  Roberto Navigli,et al.  Clustering and Diversifying Web Search Results with Graph-Based Word Sense Induction , 2013, CL.

[46]  Kaspar Riesen,et al.  Approximate graph edit distance computation by means of bipartite graph matching , 2009, Image Vis. Comput..

[47]  Simone Paolo Ponzetto,et al.  Taxonomy induction based on a collaboratively built knowledge repository , 2011, Artif. Intell..

[48]  Jan Snajder,et al.  TakeLab: Systems for Measuring Semantic Text Similarity , 2012, *SEMEVAL.

[49]  Ted Pedersen,et al.  Using Measures of Semantic Relatedness for Word Sense Disambiguation , 2003, CICLing.