A Proxy-Based Query Aggregation Method for Distributed Key-Value Stores

Distributed key-value stores (D-KVS) are critical backbone for SNS and cloud services. Some D-KVS are based on a ring architecture with multiple database nodes to handle large amount of data. Any of them can receive queries from clients, and the node forwards queries to an adequate node if necessary. Therefore, this architecture causes heavy overhead of packet processing for each node.Some D-KVS have adopted fast packet processing frameworks like DPDK, but this is not enough to handle huge amount of requests. We introduce a query aggregation method to D-KVS to reduce the network traffic. In our approach, client queries are aggregated into a few large-sized query packets by a centralized proxy. The proxy receives every query from the clients, and it routes aggregated queries to the destination nodes. The proxy is built on top of DPDK-based network stack and can deal with the growing of the clients by increasing the number of CPU cores for packet handling. We evaluated with the environment of three Cassandra nodes linked with 10 Gbps network. Our approach improved throughput by 19% compared with the non-proxy Cassandra.

[1]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[2]  Hiroki Nakayama,et al.  PA-Flow: Gradual Packet Aggregation at Virtual Network I/O for Efficient Service Chaining , 2017, 2017 IEEE International Conference on Cloud Computing Technology and Science (CloudCom).

[3]  Byung-Gon Chun,et al.  Usenix Association 10th Usenix Symposium on Operating Systems Design and Implementation (osdi '12) 135 Megapipe: a New Programming Interface for Scalable Network I/o , 2022 .

[4]  Tony Tung,et al.  Scaling Memcache at Facebook , 2013, NSDI.

[5]  Alexander L. Wolf,et al.  NetAgg: Using Middleboxes for Application-specific On-path Aggregation in Data Centres , 2014, CoNEXT.

[6]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[7]  Hiroki Nakayama,et al.  Evaluation of Forwarding Efficiency in NFV-Nodes Toward Predictable Service Chain Performance , 2017, IEEE Transactions on Network and Service Management.

[8]  Prashant Malik,et al.  Cassandra: a decentralized structured storage system , 2010, OPSR.

[9]  GhemawatSanjay,et al.  The Google file system , 2003 .

[10]  Divyakant Agrawal,et al.  Big data and cloud computing: current state and future opportunities , 2011, EDBT/ICDT '11.

[11]  Hyeontaek Lim,et al.  MICA: A Holistic Approach to Fast In-Memory Key-Value Storage , 2014, NSDI.

[12]  David R. Karger,et al.  Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web , 1997, STOC '97.

[13]  Eunyoung Jeong,et al.  mTCP: a Highly Scalable User-level TCP Stack for Multicore Systems , 2014, NSDI.

[14]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.