论文信息 - Transactions on Large-Scale Data- and Knowledge-Centered Systems XXXVIII

Transactions on Large-Scale Data- and Knowledge-Centered Systems XXXVIII

Finding top-rank products based on a given user’s preference is a user-view rank model that helps users to find their desired products. Recently, another query processing problem named reverse rank query has attracted significant research interest. The reverse rank query is a manufacturer-view model and can find users based on a given product. It can help to target potential users or find the placement for a specific product in marketing analysis. Unfortunately, previous reverse rank queries only consider one product, and they cannot identify the users for product bundling, which is known as a common sales strategy. To address the limitation, we propose a new query named aggregate reverse rank query to find matching users for a set of products. Three different aggregate rank functions (SUM, MIN, MAX) are proposed to evaluate a given product bundling in a variety of ways and target different users. To resolve these queries more efficiently, we propose a novel and sophisticated bound-and-filter framework. In the bound phase, two points are found to bound the query set for excluding candidates outside the bounds. In the filter phase, two tree-based methods are implemented with the bounds; they are the tree pruning method (TPM) and the double-tree method (DTM). The theoretical analysis and experimental results demonstrate the efficacy of the proposed methods.

[1] Yufei Tao,et al. Multidimensional reverse kNN search , 2007, The VLDB Journal.

[2] Tim Kraska,et al. CrowdER: Crowdsourcing Entity Resolution , 2012, Proc. VLDB Endow..

[3] Hector Garcia-Molina,et al. Evaluating entity resolution results , 2010, Proc. VLDB Endow..

[4] Magdalena Balazinska,et al. Efficient iterative processing in the SciDB parallel array engine , 2015, SSDBM.

[5] Benjamin Bustos,et al. Analyzing and dynamically indexing the query set , 2014, Inf. Syst..

[6] George L. Nemhauser,et al. The Traveling Salesman Problem: A Survey , 1968, Oper. Res..

[7] Andrew McCallum,et al. Efficient clustering of high-dimensional data sets with application to reference matching , 2000, KDD '00.

[8] Salvatore Orlando,et al. Similarity caching in large-scale image retrieval , 2012, Inf. Process. Manag..

[9] Benjamin Bustos,et al. D-Cache: Universal Distance Cache for Metric Access Methods , 2012, IEEE Transactions on Knowledge and Data Engineering.

[10] Sunita Sarawagi,et al. Efficient set joins on similarity predicates , 2004, SIGMOD '04.

[11] Pete Wyckoff,et al. Hive - A Warehousing Solution Over a Map-Reduce Framework , 2009, Proc. VLDB Endow..

[12] Chiang Lee,et al. Multiple k nearest neighbor search , 2017, World Wide Web.

[13] Magdalena Balazinska,et al. ArrayStore: a storage manager for complex parallel array processing , 2011, SIGMOD '11.

[14] Hiroyuki Kitagawa,et al. Aggregate Reverse Rank Queries , 2016, DEXA.

[15] J. Spencer Love,et al. Caching strategies to improve disk system performance , 1994, Computer.

[16] Alfredo Cuzzocrea,et al. SJClust: Towards a Framework for Integrating Similarity Join Algorithms and Clustering , 2016, ICEIS.

[17] Zheng Shao,et al. Hive - a petabyte scale data warehouse using Hadoop , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[18] Aldo von Wangenheim,et al. SimDataMapper: An Architectural Pattern to Integrate Declarative Similarity Matching into Database Applications , 2015, SBBD.

[19] Hong Liu,et al. Cleaning Framework for Big Data - Object Identification and Linkage , 2015, 2015 IEEE International Congress on Big Data.

[20] Pavel Zezula,et al. Evaluation Platform for Content-Based Image Retrieval Systems , 2011, TPDL.

[21] David Cunningham,et al. M3R: Increased performance for in-memory Hadoop jobs , 2012, Proc. VLDB Endow..

[22] Veronica Gil Costa,et al. Efficient Similarity Search by Combining Indexing and Caching Strategies , 2015, SOFSEM.

[23] Leslie G. Valiant,et al. A bridging model for parallel computation , 1990, CACM.

[24] David Novak,et al. MESSIF: Metric Similarity Search Implementation Framework , 2007, DELOS.

[25] Alfredo Cuzzocrea,et al. Discovering Frequent Patterns from Uncertain Data Streams with Time-Fading and Landmark Models , 2013, Trans. Large Scale Data Knowl. Centered Syst..

[26] Michael H. Böhlen,et al. Cleansing Databases of Misspelled Proper Nouns , 2006, CleanDB.

[27] Kyriakos Mouratidis,et al. Aggregate nearest neighbor queries in spatial databases , 2005, TODS.

[28] Muhammad Aamir Cheema,et al. Reverse k Nearest Neighbors Query Processing: Experiments and Analysis , 2015, Proc. VLDB Endow..

[29] Mauricio Marín,et al. Evaluation of Static/Dynamic Cache for Similarity Search Engines , 2016, SOFSEM.

[30] Jeffrey Xu Yu,et al. Efficient similarity joins for near-duplicate detection , 2011, TODS.

[31] Christos Doulkeridis,et al. Monochromatic and Bichromatic Reverse Top-k Queries , 2011, IEEE Transactions on Knowledge and Data Engineering.

[32] Nan Tang. Big RDF data cleaning , 2015, 2015 31st IEEE International Conference on Data Engineering Workshops.

[33] Ihab F. Ilyas,et al. A survey of top-k query processing techniques in relational database systems , 2008, CSUR.

[34] Theo Härder,et al. Generalizing prefix filtering to improve set similarity joins , 2011, Inf. Syst..

[35] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[36] G. Laporte. The traveling salesman problem: An overview of exact and approximate algorithms , 1992 .

[37] Lior Rokach,et al. Recommender Systems for Product Bundling , 2016, RecSys Posters.

[38] Fabrizio Silvestri,et al. Boosting the performance of Web search engines: Caching and prefetching query results by exploiting historical usage data , 2006, TOIS.

[39] Andrea Esuli,et al. A comparison of pivot selection techniques for permutation-based indexing , 2015, Inf. Syst..

[40] Christos Doulkeridis,et al. Identifying the most influential data objects with reverse top-k queries , 2010, Proc. VLDB Endow..

[41] Gonzalo Navarro,et al. Effective Proximity Retrieval by Ordering Permutations , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42] Theo Härder,et al. Performance prediction for set similarity joins , 2015, SAC.

[43] Yi Wang,et al. SciMATE: A Novel MapReduce-Like Framework for Multiple Scientific Data Formats , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).

[44] Alfredo Cuzzocrea,et al. Incorporating Clustering into Set Similarity Join Algorithms: The SjClust Framework , 2016, DEXA.

[45] David Novak,et al. Metric Index: An efficient and scalable solution for precise and approximate similarity search , 2011, Inf. Syst..

[46] Divesh Srivastava,et al. Record linkage: similarity measures and algorithms , 2006, SIGMOD Conference.

[47] Pavel Zezula,et al. Enhancing Similarity Search Throughput by Dynamic Query Reordering , 2016, DEXA.

[48] Zi Huang,et al. Batch Nearest Neighbor Search for Video Retrieval , 2008, IEEE Transactions on Multimedia.

[49] Sergei Vassilvitskii,et al. Nearest-neighbor caching for content-match applications , 2009, WWW '09.

[50] Nikolaus Augsten,et al. An Empirical Evaluation of Set Similarity Join Techniques , 2016, Proc. VLDB Endow..