High-Throughput Subset Matching on Commodity GPU-Based Systems

Large-scale information processing often relies on subset matching for data classification and routing. Examples are publish/subscribe and stream processing systems, database systems, social media, and information-centric networking. For instance, an advanced Twitter-like messaging service where users might follow specific publishers as well as specific topics encoded as tag sets must join a stream of published messages with the users and their preferred tag sets so that the user tag set is a subset of the message tags. Subset matching is an old but also notoriously difficult problem. We present TagMatch, a system that solves this problem by taking advantage of a hybrid CPU/GPU stream processing architecture. TagMatch targets large-scale applications with thousands of matching operations per seconds against hundreds of millions of tag sets. We evaluate TagMatch on an advanced message streaming application, with very positive results both in absolute terms and in comparison with existing systems. As a notable example, our experiments demonstrate that TagMatch running on a single, commodity machine with two GPUs can easily sustain the traffic throughput of Twitter even augmented with expressive tag-based selection.

[1]  Hector Garcia-Molina,et al.  Adaptive algorithms for set containment joins , 2003, TODS.

[2]  W Luk,et al.  Accelerating Publish/Subscribe Matching on Reconfigurable Supercomputing Platforms , 2010 .

[3]  Jeffrey F. Naughton,et al.  Set Containment Joins: The Good, The Bad and The Ugly , 2000, VLDB.

[4]  Antonio Carzaniga,et al.  High throughput forwarding for ICN with descriptors and locators , 2016, 2016 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS).

[5]  Alessandro Margara,et al.  High-Performance Publish-Subscribe Matching Using Parallel Hardware , 2014, IEEE Transactions on Parallel and Distributed Systems.

[6]  Hans-Arno Jacobsen,et al.  Parallel event processing for content-based publish/subscribe systems , 2009, DEBS '09.

[7]  Ralf Rantzau,et al.  Processing frequent itemset discovery queries by division and set containment join operators , 2003, DMKD '03.

[8]  Nikos Mamoulis,et al.  Efficient processing of joins on set-valued attributes , 2003, SIGMOD '03.

[9]  Alexander L. Wolf,et al.  Scalable routing for tag-based information-centric networking , 2014, ICN '14.

[10]  Sven Helmer,et al.  Evaluation of Main Memory Join Algorithms for Joins with Set Comparison Join Predicates , 1996, VLDB.

[11]  Hao Wu,et al.  Wire Speed Name Lookup: A GPU-based Approach , 2013, NSDI.

[12]  Rizal Setya Perdana What is Twitter , 2013 .

[13]  Bengt Ahlgren,et al.  A survey of information-centric networking , 2012, IEEE Communications Magazine.

[14]  Vikram Pudi,et al.  Using Prefix-Trees for Efficiently Computing Set Joins , 2005, DASFAA.

[15]  Yannis E. Ioannidis,et al.  Bitmap index design and evaluation , 1998, SIGMOD '98.

[16]  Hector Garcia-Molina,et al.  Index structures for selective dissemination of information under the Boolean model , 1994, TODS.

[17]  Diego Perino,et al.  Caesar: A content router for high-speed forwarding on content names , 2012, 2014 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS).

[18]  Jan Hidders,et al.  Efficient and scalable trie-based algorithms for computing set containment relations , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[19]  Ronald L. Rivest,et al.  Partial-Match Retrieval Algorithms , 1976, SIAM J. Comput..

[20]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[21]  Patrick Crowley,et al.  Reliably scalable name prefix lookup , 2015, 2015 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS).

[22]  Bin Liu,et al.  Fast name lookup for Named Data Networking , 2014, 2014 IEEE 22nd International Symposium of Quality of Service (IWQoS).

[23]  Nikos Mamoulis,et al.  Set containment join revisited , 2015, Knowledge and Information Systems.

[24]  Tadeusz Morzy,et al.  Group Bitmap Index: A Structure for Association Rules Retrieval , 1998, KDD.

[25]  Ed H. Chi,et al.  Language Matters In Twitter: A Large Scale Study , 2011, ICWSM.