Web-collaborative filtering: recommending music by crawling the Web

Abstract We show that it is possible to collect data that are useful for collaborative filtering (CF) using an autonomous Web spider. In CF, entities are recommended to a new user based on the stated preferences of other, similar users. We describe a CF spider that collects from the Web lists of semantically related entities. These lists can then be used by existing CF algorithms by encoding them as `pseudo-users'. Importantly, the spider can collect useful data without pre-programmed knowledge about the format of particular pages or particular sites. Instead, the CF spider uses commercial Web-search engines to find pages likely to contain lists in the domain of interest, and then applies previously proposed heuristics to extract lists from these pages. We show that data collected by this spider are nearly as effective for CF as data collected from real users, and more effective than data collected by two plausible hand-programmed spiders. In some cases, autonomously spidered data can also be combined with actual user data to improve performance.

[1]  Naoki Abe,et al.  Collaborative Filtering Using Weighted Majority Prediction Algorithms , 1998, ICML.

[2]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[3]  John Riedl,et al.  GroupLens: an open architecture for collaborative filtering of netnews , 1994, CSCW '94.

[4]  Pattie Maes,et al.  Social information filtering: algorithms for automating “word of mouth” , 1995, CHI '95.

[5]  Nicholas Kushmerick,et al.  Wrapper Induction for Information Extraction , 1997, IJCAI.

[6]  Manfred K. Warmuth,et al.  Learning binary relations using weighted majority voting , 2004, Machine Learning.

[7]  Craig A. Knoblock,et al.  Modeling Web Sources for Information Integration , 1998, AAAI/IAAI.

[8]  Mark Rosenstein,et al.  Recommending and evaluating choices in a virtual community of use , 1995, CHI '95.

[9]  William W. Cohen,et al.  Recommendation as Classification: Using Social and Content-Based Information in Recommendation , 1998, AAAI/IAAI.

[10]  Henry Lieberman,et al.  Letizia: An Agent That Assists Web Browsing , 1995, IJCAI.

[11]  Oren Etzioni,et al.  Adaptive Web sites , 2000, CACM.

[12]  John Riedl,et al.  Combining Collaborative Filtering with Personal Agents for Better Recommendations , 1999, AAAI/IAAI.

[13]  Hector Garcia-Molina,et al.  Extracting Semistructured Information from the Web. , 1997 .

[14]  Oren Etzioni,et al.  Adaptive Web Sites: an AI Challenge , 1997, IJCAI.

[15]  William W. Cohen Recognizing Structure in Web Pages using Similarity Queries , 1999, AAAI/IAAI.