PerRank: Personalized Rank Retrieval with Categorical and Numerical Attributes

Ranking has been popularly used for intelligent data retrieval in both database and machine learning communities. Recently, there were studies on integrating these two approaches to support soft queries, based on a user's sense of relevance and preference, for ranking with numerical attributes. However, in real life, it is desirable to use categorical attributes together with numerical ones in ranking. For example, when buying a car, categorical attributes, such as make, model, color, and equipments, are considered as significant factors as numerical attributes, such as price and year. Meanwhile, users often do not have sufficient domain knowledge at formulating an effective selection query over categories, whereas rank formulation is even more challenging as categories have no inherent ordering. In this paper, we propose a framework PerRank (Personalized Ranking with Categorical and Numerical Attributes) to support personalized ranking with both categorical and numerical attributes for soft queries. For an efficient computation, we developed an algorithm CAC (Clustering-based Attribute Construction) which makes use of a clustering method. Extensive experiments show CAC is effective and efficient at supporting ranking with both categorical and numerical attributes for soft queries.

[1]  Christos Faloutsos,et al.  MindReader: Querying Databases Through Multiple Examples , 1998, VLDB.

[2]  Angelo Chianese,et al.  A System for Query by Example in Image Data Base , 2001, Multimedia Information Systems.

[3]  W. Hoeffding,et al.  Rank Correlation Methods , 1949 .

[4]  John Riedl,et al.  Item-based collaborative filtering recommendation algorithms , 2001, WWW '01.

[5]  Wolf-Tilo Balke,et al.  Approaching the Efficient Frontier: Cooperative Database Retrieval Using High-Dimensional Skylines , 2005, DASFAA.

[6]  Sudipto Guha,et al.  ROCK: a robust clustering algorithm for categorical attributes , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[7]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS '01.

[8]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[9]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[10]  Seung-won Hwang,et al.  Enabling soft queries for data retrieval , 2007, Inf. Syst..

[11]  Luis Gravano,et al.  Evaluating top-k queries over web-accessible databases , 2004, TODS.

[12]  Seung-won Hwang,et al.  Probe Minimization by Schedule Optimization: Supporting Top-K Queries with Expensive Predicates , 2007, IEEE Transactions on Knowledge and Data Engineering.

[13]  Hwanjo Yu,et al.  SVM selective sampling for ranking with application to data retrieval , 2005, KDD '05.

[14]  Seung-won Hwang,et al.  Minimal probing: supporting expensive predicates for top-k queries , 2002, SIGMOD '02.

[15]  R. Forthofer,et al.  Rank Correlation Methods , 1981 .

[16]  Sergio A. Alvarez,et al.  Efficient Adaptive-Support Association Rule Mining for Recommender Systems , 2004, Data Mining and Knowledge Discovery.

[17]  Seung-won Hwang,et al.  Optimizing access cost for top-k queries over Web sources: a unified cost-based approach , 2005, 21st International Conference on Data Engineering (ICDE'05).

[18]  Ronald Fagin,et al.  Combining Fuzzy Information from Multiple Systems , 1999, J. Comput. Syst. Sci..

[19]  Seung-won Hwang,et al.  RankFP: a framework for supporting rank formulation and processing , 2005, 21st International Conference on Data Engineering (ICDE'05).

[20]  Simone Santini,et al.  Beyond query by example , 1998, MULTIMEDIA '98.