Transactions on Large-Scale Data- and Knowledge-Centered Systems XXXVIII

Finding top-rank products based on a given user’s preference is a user-view rank model that helps users to find their desired products. Recently, another query processing problem named reverse rank query has attracted significant research interest. The reverse rank query is a manufacturer-view model and can find users based on a given product. It can help to target potential users or find the placement for a specific product in marketing analysis. Unfortunately, previous reverse rank queries only consider one product, and they cannot identify the users for product bundling, which is known as a common sales strategy. To address the limitation, we propose a new query named aggregate reverse rank query to find matching users for a set of products. Three different aggregate rank functions (SUM, MIN, MAX) are proposed to evaluate a given product bundling in a variety of ways and target different users. To resolve these queries more efficiently, we propose a novel and sophisticated bound-and-filter framework. In the bound phase, two points are found to bound the query set for excluding candidates outside the bounds. In the filter phase, two tree-based methods are implemented with the bounds; they are the tree pruning method (TPM) and the double-tree method (DTM). The theoretical analysis and experimental results demonstrate the efficacy of the proposed methods.

[1]  Yufei Tao,et al.  Multidimensional reverse kNN search , 2007, The VLDB Journal.

[2]  Tim Kraska,et al.  CrowdER: Crowdsourcing Entity Resolution , 2012, Proc. VLDB Endow..

[3]  Hector Garcia-Molina,et al.  Evaluating entity resolution results , 2010, Proc. VLDB Endow..

[4]  Magdalena Balazinska,et al.  Efficient iterative processing in the SciDB parallel array engine , 2015, SSDBM.

[5]  Benjamin Bustos,et al.  Analyzing and dynamically indexing the query set , 2014, Inf. Syst..

[6]  George L. Nemhauser,et al.  The Traveling Salesman Problem: A Survey , 1968, Oper. Res..

[7]  Andrew McCallum,et al.  Efficient clustering of high-dimensional data sets with application to reference matching , 2000, KDD '00.

[8]  Salvatore Orlando,et al.  Similarity caching in large-scale image retrieval , 2012, Inf. Process. Manag..

[9]  Benjamin Bustos,et al.  D-Cache: Universal Distance Cache for Metric Access Methods , 2012, IEEE Transactions on Knowledge and Data Engineering.

[10]  Sunita Sarawagi,et al.  Efficient set joins on similarity predicates , 2004, SIGMOD '04.

[11]  Pete Wyckoff,et al.  Hive - A Warehousing Solution Over a Map-Reduce Framework , 2009, Proc. VLDB Endow..

[12]  Chiang Lee,et al.  Multiple k nearest neighbor search , 2017, World Wide Web.

[13]  Magdalena Balazinska,et al.  ArrayStore: a storage manager for complex parallel array processing , 2011, SIGMOD '11.

[14]  Hiroyuki Kitagawa,et al.  Aggregate Reverse Rank Queries , 2016, DEXA.

[15]  J. Spencer Love,et al.  Caching strategies to improve disk system performance , 1994, Computer.

[16]  Alfredo Cuzzocrea,et al.  SJClust: Towards a Framework for Integrating Similarity Join Algorithms and Clustering , 2016, ICEIS.

[17]  Zheng Shao,et al.  Hive - a petabyte scale data warehouse using Hadoop , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[18]  Aldo von Wangenheim,et al.  SimDataMapper: An Architectural Pattern to Integrate Declarative Similarity Matching into Database Applications , 2015, SBBD.

[19]  Hong Liu,et al.  Cleaning Framework for Big Data - Object Identification and Linkage , 2015, 2015 IEEE International Congress on Big Data.

[20]  Pavel Zezula,et al.  Evaluation Platform for Content-Based Image Retrieval Systems , 2011, TPDL.

[21]  David Cunningham,et al.  M3R: Increased performance for in-memory Hadoop jobs , 2012, Proc. VLDB Endow..

[22]  Veronica Gil Costa,et al.  Efficient Similarity Search by Combining Indexing and Caching Strategies , 2015, SOFSEM.

[23]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[24]  David Novak,et al.  MESSIF: Metric Similarity Search Implementation Framework , 2007, DELOS.

[25]  Alfredo Cuzzocrea,et al.  Discovering Frequent Patterns from Uncertain Data Streams with Time-Fading and Landmark Models , 2013, Trans. Large Scale Data Knowl. Centered Syst..

[26]  Michael H. Böhlen,et al.  Cleansing Databases of Misspelled Proper Nouns , 2006, CleanDB.

[27]  Kyriakos Mouratidis,et al.  Aggregate nearest neighbor queries in spatial databases , 2005, TODS.

[28]  Muhammad Aamir Cheema,et al.  Reverse k Nearest Neighbors Query Processing: Experiments and Analysis , 2015, Proc. VLDB Endow..

[29]  Mauricio Marín,et al.  Evaluation of Static/Dynamic Cache for Similarity Search Engines , 2016, SOFSEM.

[30]  Jeffrey Xu Yu,et al.  Efficient similarity joins for near-duplicate detection , 2011, TODS.

[31]  Christos Doulkeridis,et al.  Monochromatic and Bichromatic Reverse Top-k Queries , 2011, IEEE Transactions on Knowledge and Data Engineering.

[32]  Nan Tang Big RDF data cleaning , 2015, 2015 31st IEEE International Conference on Data Engineering Workshops.

[33]  Ihab F. Ilyas,et al.  A survey of top-k query processing techniques in relational database systems , 2008, CSUR.

[34]  Theo Härder,et al.  Generalizing prefix filtering to improve set similarity joins , 2011, Inf. Syst..

[35]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[36]  G. Laporte The traveling salesman problem: An overview of exact and approximate algorithms , 1992 .

[37]  Lior Rokach,et al.  Recommender Systems for Product Bundling , 2016, RecSys Posters.

[38]  Fabrizio Silvestri,et al.  Boosting the performance of Web search engines: Caching and prefetching query results by exploiting historical usage data , 2006, TOIS.

[39]  Andrea Esuli,et al.  A comparison of pivot selection techniques for permutation-based indexing , 2015, Inf. Syst..

[40]  Christos Doulkeridis,et al.  Identifying the most influential data objects with reverse top-k queries , 2010, Proc. VLDB Endow..

[41]  Gonzalo Navarro,et al.  Effective Proximity Retrieval by Ordering Permutations , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Theo Härder,et al.  Performance prediction for set similarity joins , 2015, SAC.

[43]  Yi Wang,et al.  SciMATE: A Novel MapReduce-Like Framework for Multiple Scientific Data Formats , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).

[44]  Alfredo Cuzzocrea,et al.  Incorporating Clustering into Set Similarity Join Algorithms: The SjClust Framework , 2016, DEXA.

[45]  David Novak,et al.  Metric Index: An efficient and scalable solution for precise and approximate similarity search , 2011, Inf. Syst..

[46]  Divesh Srivastava,et al.  Record linkage: similarity measures and algorithms , 2006, SIGMOD Conference.

[47]  Pavel Zezula,et al.  Enhancing Similarity Search Throughput by Dynamic Query Reordering , 2016, DEXA.

[48]  Zi Huang,et al.  Batch Nearest Neighbor Search for Video Retrieval , 2008, IEEE Transactions on Multimedia.

[49]  Sergei Vassilvitskii,et al.  Nearest-neighbor caching for content-match applications , 2009, WWW '09.

[50]  Nikolaus Augsten,et al.  An Empirical Evaluation of Set Similarity Join Techniques , 2016, Proc. VLDB Endow..