Consensus-Based Ranking of Multivalued Objects: A Generalized Borda Count Approach

In this paper, we tackle a novel problem of ranking multivalued objects, where an object has multiple instances in a multidimensional space, and the number of instances per object is not fixed. Given an ad hoc scoring function that assigns a score to a multidimensional instance, we want to rank a set of multivalued objects. Different from the existing models of ranking uncertain and probabilistic data, which model an object as a random variable and the instances of an object are assumed exclusive, we have to capture the coexistence of instances here. To tackle the problem, we advocate the semantics of favoring widely preferred objects instead of majority votes, which is widely used in many elections and competitions. Technically, we borrow the idea from Borda Count (BC), a well-recognized method in consensus-based voting systems. However, Borda Count cannot handle multivalued objects of inconsistent cardinality, and is costly to evaluate top (k) queries on large multidimensional data sets. To address the challenges, we extend and generalize Borda Count to quantile-based Borda Count, and develop efficient computational methods with comprehensive cost analysis. We present case studies on real data sets to demonstrate the effectiveness of the generalized Borda Count ranking, and use synthetic and real data sets to verify the efficiency of our computational method.

[1]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[2]  Javed A. Aslam,et al.  Models for metasearch , 2001, SIGIR '01.

[3]  Ronald L. Rivest,et al.  Introduction to Algorithms, Second Edition , 2001 .

[4]  Mohamed A. Soliman,et al.  Top-k Query Processing in Uncertain Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[5]  Ana L. N. Fred,et al.  Analysis of consensus partition in cluster ensemble , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[6]  J. H. Smith AGGREGATION OF PREFERENCES WITH VARIABLE ELECTORATE , 1973 .

[7]  Jian Li,et al.  A unified approach to ranking in probabilistic databases , 2009, The VLDB Journal.

[8]  Lani Guinier,et al.  The tyranny of the majority : fundamental fairness in representative democracy , 1994 .

[9]  Kyriakos Mouratidis,et al.  Continuous monitoring of top-k queries over sliding windows , 2006, SIGMOD Conference.

[10]  Bruce G. Lindsay,et al.  Approximate medians and other quantiles in one pass and with limited memory , 1998, SIGMOD '98.

[11]  Mark de Berg,et al.  Computational geometry: algorithms and applications, 3rd Edition , 1997 .

[12]  W. Bruce Croft,et al.  Searching distributed collections with inference networks , 1995, SIGIR '95.

[13]  Muhammad Aamir Cheema,et al.  Quantile-based KNN over multi-valued objects , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[14]  Ronald Fagin,et al.  Combining Fuzzy Information from Multiple Systems , 1999, J. Comput. Syst. Sci..

[15]  Umberto Straccia,et al.  Web metasearch: rank vs. score based rank aggregation methods , 2003, SAC '03.

[16]  Mark de Berg,et al.  Computational geometry: algorithms and applications , 1997 .

[17]  Donald Kossmann,et al.  The Skyline operator , 2001, Proceedings 17th International Conference on Data Engineering.

[18]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[19]  Feifei Li,et al.  Semantics of Ranking Queries for Probabilistic Data and Expected Ranks , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[20]  Timos K. Sellis,et al.  A model for the prediction of R-tree performance , 1996, PODS.

[21]  H. P. Young,et al.  An axiomatization of Borda's rule , 1974 .

[22]  Moni Naor,et al.  Rank aggregation methods for the Web , 2001, WWW '01.

[23]  Jon Atli Benediktsson,et al.  Classification of multisource and hyperspectral data based on decision fusion , 1999, IEEE Trans. Geosci. Remote. Sens..

[24]  Dimitrios Gunopulos,et al.  Ad-hoc Top-k Query Answering for Data Streams , 2007, VLDB.

[25]  Jennifer Widom,et al.  Representing uncertain data: models, properties, and algorithms , 2009, The VLDB Journal.

[26]  Feifei Li,et al.  Semantics of Ranking Queries for Probabilistic Data , 2011, IEEE Transactions on Knowledge and Data Engineering.

[27]  John R. Smith,et al.  The onion technique: indexing for linear optimization queries , 2000, SIGMOD '00.

[28]  Ronaldus W. Meester A Natural Introduction to Probability Theory , 2004 .

[29]  Rafael Herrerías-Pleguezuelo,et al.  Distribution models theory , 2006 .

[30]  Surya Nepal,et al.  Query processing issues in image (multimedia) databases , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).