The Berkeley Data Analytics Stack (BDAS)

Summary form only given. The session on “The Berkeley Data Analytics Stack” shall elucidate its current components which include Spark, Shark and Mesos with emphasis on Spark and it's real-time extension called Spark-Streaming which adds stream processing capabilities to Spark. One-liners describing each of these technologies are as follows: 1) BDAS is an open source, next-generation data analytics stack under development at the UC Berkeley AMPLab. 2) Spark, a high-speed cluster computing system compatible with Hadoop that can outperform it by up to 100x thanks to its ability to perform computations in memory. 3) Shark, a port of Apache Hive onto Spark that is compatible with existing Hive warehouses and queries. Shark can answer HiveQL queries up to 100x faster than Hive without modification to the data and queries, and is also open source as part of BDAS. 4) Mesos is a cluster manager that provides efficient resource isolation and sharing across distributed applications or frameworks. It can run Hadoop, MPI, Hypertable, Spark, and other applications on a dynamically shared pool of nodes. 5) Apart, from an elaborate explanation of various facets of Spark, the session would also aim to walk through machine learning algorithm benchmarking and examples that would substantiate the concepts covered.