Zohmg—A Large Scale Data Store for Aggregated Time-series-based Data

Analyzing data at a massive scale is one of the biggest challenges that Last.fm is facing. Interpreting patterns in user behaviour becomes a challenge when millions of users interact in billions of combinations; the data sets must be analyzed, summarized and presented visually. This thesis describes a data store for multi-dimensional time-series-based data. Measurements are summarized across multiple dimensions. The data store is optimized for speed of data retrieval: one of the design goals is to serve data at mouse-click rate to promote real-time data exploration. Similar data stores do exist but they generally use relational database systems as their backing database. The novelty of our approach is to model multidimensional data cubes on top of a distributed, column-oriented database to reap the scalability benefits of such databases. ------------------------------------------------------------ //Sammanfattning// Att analysera data pa en massiv skala ar en av de storsta utmaningarna som Last.fm star infor. Att tolka monster i anvandarbeteende blir en utmaning nar miljoner anvandare samspelar i miljarder kombinationer. Datamangderna maste analyseras, summeras och presenteras visuellt. Detta examensarbete beskriver ett datalager for multidimensionell tidsseriebaserad data. Matt ar summerade over multipla dimensioner. Datalagret ar optimerat for dataextraheringshastighet: Ett av designmalen ar att servera data i musklickshastighet for att framja utforskning av data i realtid. Liknande datalager existerar men de anvander oftast relationella databassystem som databas for back-end. Nyheten i vart angripssatt ar att modellera multidimensionella datakuber ovanpa en distribuerad, kolumnorienterad databas for att utnyttja skalbarhetsfordelarna av sadana databaser.