Information extraction for knowledge base construction in the music domain

The rate at which information about music is being created and shared on the web is growing exponentially. However, the challenge of making sense of all this data remains an open problem. In this paper, we present and evaluate an Information Extraction pipeline aimed at the construction of a Music Knowledge Base. Our approach starts off by collecting thousands of stories about songs from the songfacts.com website. Then, we combine a state-of-the-art Entity Linking tool and a linguistically motivated rule-based algorithm to extract semantic relations between entity pairs. Next, relations with similar semantics are grouped into clusters by exploiting syntactic dependencies. These relations are ranked thanks to a novel confidence measure based on statistical and linguistic evidence. Evaluation is carried out intrinsically, by assessing each component of the pipeline, as well as in an extrinsic task, in which we evaluate the contribution of natural language explanations in music recommendation. We demonstrate that our method is able to discover novel facts with high precision, which are missing in current generic as well as music-specific knowledge repositories. A system that constructs a Music Knowledge Base entirely from scratch.A method for clustering and scoring relations in a Relation Extraction pipeline.Reveals music facts absent from knowledge repositories (e.g. Wikipedia).Explains music recommendations in natural language.

[1]  Emilia Gómez,et al.  FlaBase: Towards the Creation of a Flamenco Music Knowledge Base , 2015, ISMIR.

[2]  Andrew Y. Ng,et al.  Improving Word Representations via Global Context and Multiple Word Prototypes , 2012, ACL.

[3]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[4]  Pablo Gamallo,et al.  Dependency-Based Open Information Extraction , 2012 .

[5]  Tommaso Di Noia,et al.  A Linked Data Recommender System Using a Neighborhood-Based Graph Kernel , 2014, EC-Web.

[6]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[7]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[8]  Hwee Tou Ng,et al.  A Machine Learning Approach to Coreference Resolution of Noun Phrases , 2001, CL.

[9]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[10]  Simone Paolo Ponzetto,et al.  BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network , 2012, Artif. Intell..

[11]  Roberto Navigli,et al.  WiSeNet: building a wikipedia-based semantic network with ontologized relations , 2012, CIKM '12.

[12]  Razvan C. Bunescu,et al.  A Shortest Path Dependency Kernel for Relation Extraction , 2005, HLT.

[13]  Roberto Navigli,et al.  Knowledge Base Unification via Sense Embeddings and Disambiguation , 2015, EMNLP.

[14]  Roberto Navigli,et al.  NASARI: a Novel Approach to a Semantically-Aware Representation of Items , 2015, NAACL.

[15]  Xavier Serra,et al.  A Semantic-Based Approach for Artist Similarity , 2015, ISMIR.

[16]  Òscar Celma,et al.  Foafing the Music: Bridging the Semantic Gap in Music Recommendation , 2006, SEMWEB.

[17]  Catherine Havasi,et al.  ConceptNet 3 : a Flexible , Multilingual Semantic Network for Common Sense Knowledge , 2007 .

[18]  Oren Etzioni,et al.  Open Information Extraction from the Web , 2007, CACM.

[19]  José Paulo Leal,et al.  Computing Semantic Relatedness using DBPedia , 2012, SLATE.

[20]  Kevin Knight,et al.  Building a Large-Scale Knowledge Base for Machine Translation , 1994, AAAI.

[21]  Paul Buitelaar,et al.  Insights into Entity Recommendation in Web Search , 2015, IESD@ISWC.

[22]  Oren Etzioni,et al.  Open question answering over curated and extracted knowledge bases , 2014, KDD.

[23]  Giuseppe De Giacomo,et al.  Knowledge Representation and Reasoning: What's Hot , 2015, AAAI.

[24]  Sergio Oramas,et al.  Extracting Relations from Unstructured Text Sources for Music Recommendation , 2015, NLDB.

[25]  Xavier Serra,et al.  ELMD: An Automatically Generated Entity Linking Gold Standard Dataset in the Music Domain , 2016, LREC.

[26]  Alexandre Passant,et al.  dbrec - Music Recommendations Using DBpedia , 2010, SEMWEB.

[27]  Erik Cambria,et al.  Jumping NLP Curves: A Review of Natural Language Processing Research [Review Article] , 2014, IEEE Computational Intelligence Magazine.

[28]  Aron Culotta,et al.  Dependency Tree Kernels for Relation Extraction , 2004, ACL.

[29]  Òscar Celma,et al.  A new approach to evaluating novel recommendations , 2008, RecSys '08.

[30]  Xavier Serra,et al.  A Semantic Hybrid Approach for Sound Recommendation , 2015, WWW.

[31]  Ignacio Iacobacci,et al.  SensEmbed: Learning Sense Embeddings for Word and Relational Similarity , 2015, ACL.

[32]  Horacio Saggion,et al.  Applying Dependency Relations to Definition Extraction , 2014, NLDB.

[33]  Joan Codina,et al.  An Exercise in Reuse of Resources: Adapting General Discourse Coreference Resolution for Detecting Lexical Chains in Patent Documentation , 2014, LREC.

[34]  Jens Lehmann,et al.  DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia , 2015, Semantic Web.

[35]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[36]  Oren Etzioni,et al.  Open Language Learning for Information Extraction , 2012, EMNLP.

[37]  Roberto Navigli,et al.  Large-Scale Information Extraction from Textual Definitions through Deep Syntactic and Semantic Analysis , 2015, TACL.

[38]  Sergio Oramas Harvesting and Structuring Social Data in Music Information Retrieval , 2014, ESWC.

[39]  Bernd Bohnet,et al.  Very high accuracy and fast dependency parsing is not a contradiction , 2010, COLING 2010.

[40]  Estevam R. Hruschka,et al.  Coupled semi-supervised learning for information extraction , 2010, WSDM '10.

[41]  Horacio Saggion,et al.  Multi-document summarization by cluster/prole relevance and redundancy removal , 2004 .

[42]  M. de Rijke,et al.  Learning to Explain Entity Relationships in Knowledge Graphs , 2015, ACL.

[43]  Paolo Ferragina,et al.  TAGME: on-the-fly annotation of short text fragments (by wikipedia entities) , 2010, CIKM.

[44]  Roberto Navigli,et al.  Entity Linking meets Word Sense Disambiguation: a Unified Approach , 2014, TACL.

[45]  Christian Bizer,et al.  DBpedia spotlight: shedding light on the web of documents , 2011, I-Semantics '11.

[46]  Xavier Serra,et al.  Automatic Creation of Knowledge Graphs from Digital Musical Document Libraries , 2014 .

[47]  Cong Yu,et al.  REX: Explaining Relationships between Entity Pairs , 2011, Proc. VLDB Endow..

[48]  Gerhard Weikum,et al.  PATTY: A Taxonomy of Relational Patterns with Semantic Types , 2012, EMNLP.

[49]  Roberto Navigli,et al.  Integrating Syntactic and Semantic Analysis into the Open Information Extraction Paradigm , 2013, IJCAI.

[50]  A. Swartz MusicBrainz: A Semantic Web Service , 2002, IEEE Intell. Syst..

[51]  Lucien Tesnière Éléments de syntaxe structurale , 1959 .

[52]  Oren Etzioni,et al.  Identifying Relations for Open Information Extraction , 2011, EMNLP.