STI-BT: A Scalable Transactional Index

Distributed Key-Value (DKV) stores have been intensively used to manage online transaction processing on large data-sets. DKV stores provide simplistic primitives to access data based on the primary key of the stored objects. To help programmers to efficiently retrieve data, some DKV stores provide distributed indexes. Besides that, and also to simplify programming such applications, several proposals have provided strong consistency abstractions via distributed transactions. In this paper we present STI-BT, a highly scalable, transactional index for Distributed Key-Value stores. STI-BT is organized as a distributed B<inline-formula><tex-math notation="LaTeX">$^+$</tex-math> <alternatives><inline-graphic xlink:type="simple" xlink:href="diegues-ieq1-2485267.gif"/></alternatives></inline-formula>Tree and adopts an innovative design that allows to achieve high efficiency in large-scale, elastic DKV stores. As such, it provides both the desirable properties identified above, and does so in a far more efficient and scalable way than the few existing state of the art proposals that also enable programmers to have strongly consistent distributed transactional indexes. We have implemented STI-BT on top of an open-source DKV store and deployed it on a public cloud infrastructure. Our extensive study demonstrates scalability in a cluster of <inline-formula><tex-math notation="LaTeX">$100$</tex-math> <alternatives><inline-graphic xlink:type="simple" xlink:href="diegues-ieq2-2485267.gif"/></alternatives></inline-formula> machines, and speed ups with respect to state of the art up to <inline-formula><tex-math notation="LaTeX">$5.4\times$</tex-math> <alternatives><inline-graphic xlink:type="simple" xlink:href="diegues-ieq3-2485267.gif"/></alternatives></inline-formula>.

[1]  Roberto Palmieri,et al.  Hyflow2: a high performance distributed transactional memory framework in scala , 2013, PPPJ.

[2]  Paolo Romano,et al.  Transactional auto scaler: elastic scaling of in-memory transactional data grids , 2012, ICAC '12.

[3]  S. B. Yao,et al.  Efficient locking for concurrent operations on B-trees , 1981, TODS.

[4]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[5]  Luís E. T. Rodrigues,et al.  On the use of Clocks to Enforce Consistency in the Cloud , 2015, IEEE Data Eng. Bull..

[6]  Marvin Theimer,et al.  Session guarantees for weakly consistent replicated data , 1994, Proceedings of 3rd International Conference on Parallel and Distributed Information Systems.

[7]  Fernando Pedone,et al.  P-Store: Genuine Partial Replication in Wide Area Networks , 2010, 2010 29th IEEE Symposium on Reliable Distributed Systems.

[8]  Christopher Frost,et al.  Spanner: Google's Globally-Distributed Database , 2012, OSDI.

[9]  Ken Yocum,et al.  Ad-hoc data processing in the cloud , 2008, Proc. VLDB Endow..

[10]  Beng Chin Ooi,et al.  BATON: A Balanced Tree Structure for Peer-to-Peer Networks , 2005, VLDB.

[11]  Philip J. Fleming,et al.  How not to lie with statistics: the correct way to summarize benchmark results , 1986, CACM.

[12]  Maurice Herlihy,et al.  Software transactional memory for dynamic-sized data structures , 2003, PODC '03.

[13]  Emin Gün Sirer,et al.  HyperDex: a distributed, searchable key-value store , 2012, SIGCOMM '12.

[14]  Barbara Liskov,et al.  Weak Consistency: A Generalized Theory and Optimistic Implementations for Distributed Transactions , 1999 .

[15]  Luís E. T. Rodrigues,et al.  AutoPlacer: Scalable Self-Tuning Data Placement in Distributed Key-Value Stores , 2015, TAAS.

[16]  Divyakant Agrawal,et al.  ElasTraS: An elastic, scalable, and self-managing transactional database for the cloud , 2013, TODS.

[17]  William E. Weihl,et al.  Commutativity-based concurrency control for abstract data types , 1988, [1988] Proceedings of the Twenty-First Annual Hawaii International Conference on System Sciences. Volume II: Software track.

[18]  David R. Karger,et al.  Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web , 1997, STOC '97.

[19]  Alan Fekete,et al.  YCSB+T: Benchmarking web-scale transactional databases , 2014, 2014 IEEE 30th International Conference on Data Engineering Workshops.

[20]  Nuno Diegues,et al.  Bumper: Sheltering Transactions from Conflicts , 2013, 2013 IEEE 32nd International Symposium on Reliable Distributed Systems.

[21]  Luís E. T. Rodrigues,et al.  When Scalability Meets Consistency: Genuine Multiversion Update-Serializable Partial Data Replication , 2012, 2012 IEEE 32nd International Conference on Distributed Computing Systems.

[22]  Beng Chin Ooi,et al.  Efficient B-tree based indexing for cloud data processing , 2010, Proc. VLDB Endow..

[23]  Roberto Palmieri,et al.  Automated Data Partitioning for Highly Scalable and Strongly Consistent Transactions , 2014, IEEE Transactions on Parallel and Distributed Systems.

[24]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[25]  Manish Parashar,et al.  Squid: Enabling search in DHT-based systems , 2008, J. Parallel Distributed Comput..

[26]  Yawei Li,et al.  Megastore: Providing Scalable, Highly Available Storage for Interactive Services , 2011, CIDR.

[27]  Marc Najork,et al.  Boxwood: Abstractions as the Foundation for Storage Infrastructure , 2004, OSDI.

[28]  Goetz Graefe,et al.  A survey of B-tree locking techniques , 2010, TODS.

[29]  Nuno Diegues,et al.  Bumper: Sheltering distributed transactions from conflicts , 2015, Future Gener. Comput. Syst..

[30]  Weidong Xiao,et al.  P2P-based multidimensional indexing methods: A survey , 2011, J. Syst. Softw..

[31]  Rachid Guerraoui,et al.  On the correctness of transactional memory , 2008, PPoPP.

[32]  Marc Shapiro,et al.  Non-monotonic Snapshot Isolation: Scalable and Strong Consistency for Geo-replicated Transactional Systems , 2013, 2013 IEEE 32nd International Symposium on Reliable Distributed Systems.

[33]  Nuno Diegues,et al.  Time-Warp: Efficient Abort Reduction in Transactional Memory , 2015, TOPC.

[34]  Marcos K. Aguilera,et al.  Transactional storage for geo-replicated systems , 2011, SOSP.

[35]  Beng Chin Ooi,et al.  Indexing multi-dimensional data in a cloud system , 2010, SIGMOD Conference.

[36]  Wojciech M. Golab,et al.  Minuet: A Scalable Distributed Multiversion B-Tree , 2012, Proc. VLDB Endow..

[37]  Nuno Diegues,et al.  STI-BT: A Scalable Transactional Index , 2016 .

[38]  Kunle Olukotun,et al.  STAMP: Stanford Transactional Applications for Multi-Processing , 2008, 2008 IEEE International Symposium on Workload Characterization.

[39]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[40]  Marcos K. Aguilera,et al.  Sinfonia: a new paradigm for building scalable distributed systems , 2007, SOSP.

[41]  Mohammad Ghodsi,et al.  Skiptree: A new scalable distributed data structure on multidimensional data supporting range-queries , 2010, Comput. Commun..

[42]  Roberto Palmieri,et al.  Enhancing Concurrency in Distributed Transactional Memory through Commutativity , 2013, Euro-Par.

[43]  Paolo Romano,et al.  SCORe: A Scalable One-Copy Serializable Partial Replication Protocol , 2012, Middleware.

[44]  Marcos K. Aguilera,et al.  A practical scalable distributed B-tree , 2008, Proc. VLDB Endow..