A Semantic-Based Approach for Artist Similarity

This paper describes and evaluates a method for computing artist similarity from a set of artist biographies. The proposed method aims at leveraging semantic information present in these biographies, and can be divided in three main steps, namely: (1) entity linking, i.e. detecting mentions to named entities in the text and linking them to an external knowledge base; (2) deriving a knowledge representation from these mentions in the form of a semantic graph or a mapping to a vector-space model; and (3) computing semantic similarity between documents. We test this approach on a corpus of 188 artist biographies and a slightly larger dataset of 2,336 artists, both gathered from Last.fm. The former is mapped to the MIREX Audio and Music Similarity evaluation dataset, so that its similarity judgments can be used as ground truth. For the latter dataset we use the similarity between artists as provided by the Last.fm API. Our evaluation results show that an approach that computes similarity over a graph of entitiesand semantic categories clearly outperforms a baseline that exploits word co-occurrences and latent factors.

[1]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[2]  Sergio Oramas,et al.  Extracting Relations from Unstructured Text Sources for Music Recommendation , 2015, NLDB.

[3]  Jennifer Widom,et al.  SimRank: a measure of structural-context similarity , 2002, KDD.

[4]  Bernd Bohnet,et al.  Very high accuracy and fast dependency parsing is not a contradiction , 2010, COLING 2010.

[5]  Tommaso Di Noia,et al.  A Linked Data Recommender System Using a Neighborhood-Based Graph Kernel , 2014, EC-Web.

[6]  José Paulo Leal,et al.  Computing Semantic Relatedness using DBPedia , 2012, SLATE.

[7]  Steve Lawrence,et al.  Inferring Descriptions and Similarity for Music from Community Metadata , 2002, ICMC.

[8]  Horst Bunke,et al.  A graph distance metric based on the maximal common subgraph , 1998, Pattern Recognit. Lett..

[9]  Òscar Celma,et al.  Search Sounds: An audio crawler focused on weblogs , 2006, ISMIR.

[10]  William W. Cohen,et al.  Web-collaborative filtering: recommending music by crawling the Web , 2000, Comput. Networks.

[11]  Xiaohua Hu,et al.  Exploiting Wikipedia as external knowledge for document clustering , 2009, KDD.

[12]  Xiaotie Deng,et al.  Efficient Phrase-Based Document Similarity for Clustering , 2008, IEEE Transactions on Knowledge and Data Engineering.

[13]  Daniel P. W. Ellis,et al.  Toward Evaluation Techniques for Music Similarity , 2003, SIGIR 2003.

[14]  Xavier Serra,et al.  A Semantic Hybrid Approach for Sound Recommendation , 2015, WWW.

[15]  Markus Schedl,et al.  Harvesting microblogs for contextual music similarity estimation: a co-occurrence-based framework , 2014, Multimedia Systems.

[16]  Mathias Lux,et al.  A Fast and Simple Path Index Based Retrieval Approach for Graph Based Semantic Descriptions , 2005 .

[17]  Peter Knees,et al.  A WEB-BASED APPROACH TO ASSESSING ARTIST SIMILARITY USING CO-OCCURRENCES , 2005 .

[18]  W. Bruce Croft,et al.  Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , 2013 .

[19]  Mark E. Rorvig,et al.  Images of Similarity: A Visual Exploration of Optimal Similarity Metrics and Scaling Properties of TREC Topic-Document Sets , 1999, J. Am. Soc. Inf. Sci..

[20]  Xavier Serra,et al.  FOAFing the music: Bridging the semantic gap in music recommendation , 2008, J. Web Semant..

[21]  Pengfei Wang,et al.  Assessing Text Semantic Similarity Using Ontology , 2014, J. Softw..

[22]  Paolo Ferragina,et al.  TAGME: on-the-fly annotation of short text fragments (by wikipedia entities) , 2010, CIKM.

[23]  Christian Bizer,et al.  DBpedia spotlight: shedding light on the web of documents , 2011, I-Semantics '11.

[24]  Razvan C. Bunescu,et al.  Using Encyclopedic Knowledge for Named entity Disambiguation , 2006, EACL.

[25]  Simone Paolo Ponzetto,et al.  BabelNet: Building a Very Large Multilingual Semantic Network , 2010, ACL.

[26]  Sergio Oramas,et al.  A Rule-Based Approach to Extracting Relations from Music Tidbits , 2015, WWW.

[27]  ChengXiang Zhai,et al.  A Systematic Exploration of the Feature Space for Relation Extraction , 2007, NAACL.

[28]  Andreas F. Ehmann,et al.  Mining Music Reviews: Promising Preliminary Results , 2005, ISMIR.

[29]  Rada Mihalcea,et al.  Wikify!: linking documents to encyclopedic knowledge , 2007, CIKM '07.

[30]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[31]  Roberto Navigli,et al.  Multilingual Word Sense Disambiguation and Entity Linking for Everybody , 2014, International Semantic Web Conference.

[32]  Daniel P. W. Ellis,et al.  The Quest for Ground Truth in Musical Artist Similarity , 2002, ISMIR.