Big Data Exploration Via Automated Orchestration of Analytic Workflows

Large-scale data exploration using Big Data platforms requires the orchestration of complex analytic workflows composed of atomic analytic components for data selection, feature extraction, modeling and scoring. In this paper, we propose an approach that uses a combination of planning and machine learning to automatically determine the most appropriate data-driven workflows to execute in response to a user-specified objective. We combine this with orchestration mechanisms and automatically deploy, adapt and manage such workflows across Big Data platforms. We present results of this automated exploration in real settings in healthcare.

[1]  J. M. Eklund,et al.  Real-Time Analysis for Intensive Care: Development and Deployment of the Artemis Analytic System , 2010, IEEE Engineering in Medicine and Biology Magazine.

[2]  James A. Hendler,et al.  HTN planning for Web Service composition using SHOP2 , 2004, J. Web Semant..

[3]  Anton Riabov,et al.  Scalable Planning for Distributed Stream Processing Systems , 2006, ICAPS.

[4]  John Langford,et al.  Beating the hold-out: bounds for K-fold and progressive cross-validation , 1999, COLT '99.

[5]  David Silver,et al.  Learning to search: Functional gradient techniques for imitation learning , 2009, Auton. Robots.

[6]  Elad Hazan,et al.  Linear Regression with Limited Observation , 2011, ICML.

[7]  Balázs Kégl,et al.  Utility-Based Reinforcement Learning for Reactive Grids , 2008, 2008 International Conference on Autonomic Computing.

[8]  Matthew J. Streeter,et al.  Adaptive Bound Optimization for Online Convex Optimization , 2010, COLT 2010.

[9]  John Langford,et al.  Online Importance Weight Aware Updates , 2010, UAI.

[10]  Octavian Udrea,et al.  Mashup-based information retrieval for domain experts , 2009, CIKM.

[11]  Claudio Gentile,et al.  Improved Risk Tail Bounds for On-Line Algorithms , 2005, IEEE Transactions on Information Theory.

[12]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[13]  Maria Fox,et al.  PDDL2.1: An Extension to PDDL for Expressing Temporal Planning Domains , 2003, J. Artif. Intell. Res..

[14]  Rajarshi Das,et al.  A Hybrid Reinforcement Learning Approach to Autonomic Resource Allocation , 2006, 2006 IEEE International Conference on Autonomic Computing.

[15]  John Langford,et al.  Efficient Optimal Learning for Contextual Bandits , 2011, UAI.

[16]  Ohad Shamir,et al.  Efficient Learning with Partially Observed Attributes , 2010, ICML.

[17]  M. Hilario,et al.  A Data Mining Ontology for Algorithm Selection and Meta-Mining , 2009 .