论文信息 - Data Infrastructure at LinkedIn

Data Infrastructure at LinkedIn

Linked In is among the largest social networking sites in the world. As the company has grown, our core data sets and request processing requirements have grown as well. In this paper, we describe a few selected data infrastructure projects at Linked In that have helped us accommodate this increasing scale. Most of those projects build on existing open source projects and are themselves available as open source. The projects covered in this paper include: (1) Voldemort: a scalable and fault tolerant key-value store, (2) Data bus: a framework for delivering database changes to downstream applications, (3) Espresso: a distributed data store that supports flexible schemas and secondary indexing, (4) Kafka: a scalable and efficient messaging system for collecting various user activity events and log data.

[1] Leslie Lamport,et al. Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[2] Robert Morris,et al. Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM 2001.

[3] David R. Karger,et al. Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[4] Werner Vogels,et al. Dynamo: amazon's highly available key-value store , 2007, SOSP.

[5] Hans-Arno Jacobsen,et al. PNUTS: Yahoo!'s hosted data serving platform , 2008, Proc. VLDB Endow..

[6] Tom White,et al. Hadoop: The Definitive Guide , 2009 .

[7] Van-Anh Truong,et al. Availability in Globally Distributed Storage Systems , 2010, OSDI.

[8] Yawei Li,et al. Megastore: Providing Scalable, Highly Available Storage for Interactive Services , 2011, CIDR.

[9] Jay Kreps,et al. Kafka : a Distributed Messaging System for Log Processing , 2011 .