A lower-bound on the number of rankings required in recommender systems using collaborativ filtering

We consider the situation where users rank items from a given set, and each user ranks only a (small) subset of all items. We assume that users can be classified into C classes, and users in a given class c have the same ranking for all items. For this situation we are interested in the following question. As a function of the number of users N in a given class c and the numbers of items IN to be ranked, how many rankings mN per user are needed in order to be able to correctly identify all user in class c This question is of interest because correctly identifying all users in a class allows to accurately predict the ranking of an item by a given user that the user has not ranked, but that was ranked by another user in the same class. This is exactly the goal recommender systems using collaborative filtering. Therefore, being able to answer the above questions allows us to characterize how much data (i.e. how many rankings per user) is required by a recommender system using collaborative filtering to accurately predict user-item ranking pairs. We study the above question using a random graph model. Even though the resulting random graph is not a Erdos-Renyi graph, this allows us to use for our analysis similar techniques that have been developed for the analysis of Erdos-Renyi graphs.