Modelling Processes of Big Data Analytics

Analytics tasks in scientific and industrial environments are to be performed in some order that, as a whole, represent the rationale of a specific process on the data. The challenge to process the data is, beyond there mere size, their dispersion and the variety of their formats. The data analysis may include a range of tasks to be executed on a range of query engines, which are created by various users, such as business analysts, engineers, end-users etc. The users, depending on their role and expertise, may need or care for a different level of abstraction with respect to the execution of the individual tasks and overall process. Therefore, a system for Big Data analytics should enable the expression of tasks in an abstract manner, adaptable to the user role, interest and expertise. In this work we discuss the modelling of Big Data Analytics. We propose a novel representation model for analytics tasks and overall processes, that encapsulates their declaration, but, also, their execution semantics. The model allows for the definition of analytics processes with a varying level of abstraction, adaptable to the user role. Our motivation derives from real use cases.