Improving Content-Based Similarity Measures by Training a Collaborative Model

We observed that for multimedia data – especially music collaborative similarity measures perform much better than similarity measures derived from content-based sound features. Our observation is based on a large scale evaluation with >250,000,000 collaborative data points crawled from the web and >190,000 songs annotated with content-based sound feature sets. A song mentioned in a playlist is regarded as one collaborative data point. In this paper we present a novel approach to bridging the performance gap between collaborative and contentbased similarity measures. In the initial training phase a model vector for each song is computed, based on collaborative data. Each vector consists of 200 overlapping unlabelled 'genres' or song clusters. Instead of using explicit numerical voting, we use implicit user profile data as collaborative data source, which is, for example, available as purchase histories in many large scale ecommerce applications. After the training phase, we used support vector machines based on content-based sound features to predict the collaborative model vectors. These predicted model vectors are finally used to compute the similarity between songs. We show that combining collaborative and content-based similarity measures can help to overcome the new item problem in e-commerce applications that offer a collaborative similarity recommender as service to their customers.

[1]  David G. Stork,et al.  Pattern Classification , 1973 .

[2]  Jonathan Foote,et al.  Content-based retrieval of music and audio , 1997, Other Conferences.

[3]  Beth Logan,et al.  Content-Based Playlist Generation: Exploratory Experiments , 2002, ISMIR.

[4]  Daniel P. W. Ellis,et al.  A Large-Scale Evaluation of Acoustic and Subjective Music-Similarity Measures , 2004, Computer Music Journal.

[5]  J. T. Foote,et al.  "Content-Based Retrieval of Music and Audio," Multimedia Storage and Archiving System II , 1997 .

[6]  Daniel P. W. Ellis,et al.  The Quest for Ground Truth in Musical Artist Similarity , 2002, ISMIR.

[7]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[8]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[9]  Peter Brusilovsky,et al.  User modeling and user adapted interaction , 2001 .

[10]  William W. Cohen,et al.  Web-collaborative filtering: recommending music by crawling the Web , 2000, Comput. Networks.

[11]  Greg Linden,et al.  Amazon . com Recommendations Item-to-Item Collaborative Filtering , 2001 .

[12]  Charles A. Micchelli,et al.  On Learning Vector-Valued Functions , 2005, Neural Computation.

[13]  Mark B. Sandler,et al.  Classification of audio signals using statistical features on time and wavelet transform domains , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[14]  George Tzanetakis,et al.  Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[15]  Steve Lawrence,et al.  Inferring Descriptions and Similarity for Music from Community Metadata , 2002, ICMC.

[16]  Robin D. Burke,et al.  Hybrid Recommender Systems: Survey and Experiments , 2002, User Modeling and User-Adapted Interaction.

[17]  Nikita Borisov,et al.  Querying Large Collections of Music for Similarity , 2000 .

[18]  Peter Knees,et al.  Artist Classification with Web-Based Data , 2004, ISMIR.

[19]  John Riedl,et al.  Combining Collaborative Filtering with Personal Agents for Better Recommendations , 1999, AAAI/IAAI.