Challenges of automated machine learning on causal impact analytics for policy evaluation

Automated machine learning (AutoML) refers to the full aspects of automation machine learning without human in the analytics loop. The main goals of big data analytics are to determine correlation, prediction, and cause-effect among high-dimensional data features. Until now, AutoML systems were primarily proposed for classification and regression, and lacked causal impact analytics. In this study, we address the possible challenges of extending AutoML on causal impact analytics for a policy evaluation. A simplified causal inference model has been implemented on a generic AutoML system Spark ML pipeline for a scenario of policy evaluation for (inter-)national stock market impacts analytics based on GDELT big datasets.

[1]  Sergio Escalera,et al.  Design of the 2015 ChaLearn AutoML challenge , 2015, IJCNN.

[2]  Galit Shmueli,et al.  To Explain or To Predict? , 2010, 1101.0891.

[3]  Steven L. Scott,et al.  Predicting the Present with Bayesian Structural Time Series , 2013, Int. J. Math. Model. Numer. Optimisation.

[4]  Trevor Hastie,et al.  Computer Age Statistical Inference: Algorithms, Evidence, and Data Science , 2016 .

[5]  Hal R. Varian,et al.  Big Data: New Tricks for Econometrics , 2014 .

[6]  Sergio Escalera,et al.  A brief Review of the ChaLearn AutoML Challenge: Any-time Any-dataset Learning without Human Intervention , 2016, AutoML@ICML.

[7]  Preetam Nandy,et al.  A Review of Some Recent Advances in Causal Inference , 2016, Handbook of Big Data.

[8]  Han Liu,et al.  Challenges of Big Data Analysis. , 2013, National science review.

[9]  Kevin Leyton-Brown,et al.  Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms , 2012, KDD.

[10]  Peter Spirtes,et al.  Causal discovery and inference: concepts and recent methodological advances , 2016, Applied Informatics.

[11]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[12]  T. Davenport,et al.  Data scientist: the sexiest job of the 21st century. , 2012, Harvard business review.

[13]  Aaron Klein,et al.  Fast Bayesian Optimization of Machine Learning Hyperparameters on Large Datasets , 2016, AISTATS.

[14]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[15]  Lars Kotthoff,et al.  Auto-WEKA 2.0: Automatic model selection and hyperparameter optimization in WEKA , 2017, J. Mach. Learn. Res..

[16]  Aaron Klein,et al.  Efficient and Robust Automated Machine Learning , 2015, NIPS.

[17]  Ricardo Vilalta,et al.  Metalearning - Applications to Data Mining , 2008, Cognitive Technologies.

[18]  Thomas J. Steenburgh,et al.  Motivating Salespeople: What Really Works , 2012, Harvard business review.

[19]  Chris Eliasmith,et al.  Hyperopt-Sklearn: Automatic Hyperparameter Configuration for Scikit-Learn , 2014, SciPy.

[20]  Steven L. Scott,et al.  Inferring causal impact using Bayesian structural time-series models , 2015, 1506.00356.

[21]  Richard M Shiffrin,et al.  Drawing causal inference from Big Data , 2016, Proceedings of the National Academy of Sciences.

[22]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.