MIDAS: Open-source framework for distributed online analysis of data streams

Abstract Data streams are pervasive but implementing online analysis of streaming data is often nontrivial as data streams can have different, domain-specific formats. Regardless of the stream, the analysis task is essentially the same: features are extracted from the stream, e.g., to employ machine learning and data mining methods. We present the Modular Integrated Distributed Analysis System ( midas ) for constructing distributed online stream processing systems for heterogeneous data. The midas framework makes it possible to process raw data streams, extract features, perform machine learning and make the results available through an HTTP API for easy integration with various applications. midas is agnostic with regard to the type of data stream and is suitable for multiple domains.