A Framework for Big Data Analytics

Big Data may reside on multiple and dispersed sources and adhere to a variety of formats. Their analysis may include a range of tasks to be executed on a range of query engines. The tasks, as a whole, represent the rationale of a specific process. The users that create such a process may have various roles, like, business analysts, engineers, end-users etc. Each role may need or care for a different level of abstraction with respect to the execution of the individual tasks and overall process. Therefore, it is necessary to enable the expression of analytics tasks in an abstract manner, adaptable to the user role, interest and expertise. We propose a framework for the expression and preparation for execution of complex processes that perform analytics on Big Data. The framework enables the expression of such processes in the form of workflows and prepares such workflows for execution by determining and clarifying execution semantics of individual tasks, and by manipulating the workflow in order to create an equivalent workflow that will be optimally executed alone or together with other workflows.