论文信息 - On brewing fresh espresso: LinkedIn's distributed data serving platform

On brewing fresh espresso: LinkedIn's distributed data serving platform

Espresso is a document-oriented distributed data serving platform that has been built to address LinkedIn's requirements for a scalable, performant, source-of-truth primary store. It provides a hierarchical document model, transactional support for modifications to related documents, real-time secondary indexing, on-the-fly schema evolution and provides a timeline consistent change capture stream. This paper describes the motivation and design principles involved in building Espresso, the data model and capabilities exposed to clients, details of the replication and secondary indexing implementation and presents a set of experimental results that characterize the performance of the system along various dimensions. When we set out to build Espresso, we chose to apply best practices in industry, already published works in research and our own internal experience with different consistency models. Along the way, we built a novel generic distributed cluster management framework, a partition-aware change- capture pipeline and a high-performance inverted index implementation.

[1] Ramesh Subramonian,et al. Untangling cluster management with Helix , 2012, SoCC '12.

[2] Wilson C. Hsieh,et al. Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[3] Werner Vogels,et al. Dynamo: amazon's highly available key-value store , 2007, SOSP.

[4] Eric A. Brewer,et al. Towards robust distributed systems (abstract) , 2000, PODC '00.

[5] Hans-Arno Jacobsen,et al. PNUTS: Yahoo!'s hosted data serving platform , 2008, Proc. VLDB Endow..

[6] Xin Chen,et al. F1: the fault-tolerant distributed RDBMS supporting google's ad business , 2012, SIGMOD Conference.

[7] Lei Gao,et al. All aboard the Databus!: Linkedin's scalable consistent change data capture platform , 2012, SoCC '12.

[8] Lei Gao,et al. Data Infrastructure at LinkedIn , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[9] Christopher Frost,et al. Spanner: Google's Globally-Distributed Database , 2012, OSDI.

[10] Prashant Malik,et al. Cassandra: structured storage system on a P2P network , 2009, PODC '09.

[11] Yawei Li,et al. Megastore: Providing Scalable, Highly Available Storage for Interactive Services , 2011, CIDR.