Conquering Big Data with Spark

Today, big and small organizations alike collect huge amounts of data, and they do so with one goal in mind: extract "value" through sophisticated exploratory analysis, and use it as the basis to make decisions as varied as personalized treatment and ad targeting. To address this challenge, we have developed Berkeley Data Analytics Stack (BDAS), an open source data analytics stack for big data processing. In this talk I'll focus on the execution engine in BDAS: Apache Spark. Apache Spark is a cluster computing engine that is optimized for in-memory processing, and unifies support for a variety of workloads, including batch, streaming, and iterative computations. Spark is now the most active big data project in the open source community, and is already being used by over one thousand organizations.