Machine Learning Methods That Economists Should Know About

We discuss the relevance of the recent Machine Learning (ML) literature for economics and econometrics. First we discuss the differences in goals, methods and settings between the ML literature and the traditional econometrics and statistics literatures. Then we discuss some specific methods from the machine learning literature that we view as important for empirical researchers in economics. These include supervised learning methods for regression and classification, unsupervised learning methods, as well as matrix completion methods. Finally, we highlight newly developed methods at the intersection of ML and econometrics, methods that typically perform better than either off-the-shelf ML or more traditional econometric methods when applied to particular classes of problems, problems that include causal inference for average treatment effects, optimal policy estimation, and estimation of the counterfactual effect of price changes in consumer choice models.

[1]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[2]  Lakhmi C. Jain,et al.  Innovations in machine learning : theory and applications , 2006 .

[3]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[4]  Stefan Wager,et al.  Policy Learning With Observational Data , 2017, Econometrica.

[5]  Halbert White,et al.  Artificial Neural Networks: Approximation and Learning Theory , 1992 .

[6]  Stephan Mandt,et al.  Dynamic Word Embeddings via Skip-Gram Filtering , 2017, ArXiv.

[7]  Lihong Li,et al.  Learning from Logged Implicit Exploration Data , 2010, NIPS.

[8]  W. Newey,et al.  Double machine learning for treatment and causal parameters , 2016 .

[9]  A. E. Hoerl,et al.  Ridge regression: biased estimation for nonorthogonal problems , 2000 .

[10]  A. Atiya,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[11]  Thorsten Joachims,et al.  Batch learning from logged bandit feedback through counterfactual risk minimization , 2015, J. Mach. Learn. Res..

[12]  Lihong Li,et al.  Counterfactual Estimation and Optimization of Click Metrics for Search Engines , 2014, ArXiv.

[13]  J. Zubizarreta Stable Weights that Balance Covariates for Estimation With Incomplete Outcome Data , 2015 .

[14]  Vira Semenova,et al.  Orthogonal ML for Demand Estimation: High Dimensional Causal Inference in Dynamic Panels∗ , 2018 .

[15]  Susan Athey,et al.  The Impact of Machine Learning on Economics , 2018, The Economics of Artificial Intelligence.

[16]  Marc Ratkovic,et al.  Estimating treatment effect heterogeneity in randomized program evaluation , 2013, 1305.5682.

[17]  Sendhil Mullainathan,et al.  Machine Learning: An Applied Econometric Approach , 2017, Journal of Economic Perspectives.

[18]  Susan Athey,et al.  Beyond prediction: Using big data for policy problems , 2017, Science.

[19]  Susan Athey,et al.  The Impact of Aggregators on Internet News Consumption , 2017, SSRN Electronic Journal.

[20]  D. Green,et al.  Modeling Heterogeneous Treatment Effects in Survey Experiments with Bayesian Additive Regression Trees , 2012 .

[21]  Steven W. Knox Machine Learning: A Concise Introduction , 2018 .

[22]  Kevin Leyton-Brown,et al.  Counterfactual Prediction with Deep Instrumental Variables Networks , 2016, ArXiv.

[23]  J. Robins,et al.  Locally Robust Semiparametric Estimation , 2016, Econometrica.

[24]  R. Tibshirani,et al.  Local Likelihood Estimation , 1987 .

[25]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[26]  Zhengyuan Zhou,et al.  Offline Multi-Action Policy Learning: Generalization and Optimization , 2018, Oper. Res..

[27]  Susan Athey,et al.  The State of Applied Econometrics - Causality and Policy Evaluation , 2016, 1607.00699.

[28]  H. Robbins,et al.  Asymptotically efficient adaptive allocation rules , 1985 .

[29]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2009, Found. Comput. Math..

[30]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[31]  J. Bai,et al.  Inferential Theory for Factor Models of Large Dimensions , 2003 .

[32]  S. Athey,et al.  Generalized random forests , 2016, The Annals of Statistics.

[33]  J. R. Firth,et al.  A Synopsis of Linguistic Theory, 1930-1955 , 1957 .

[34]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[35]  G. Imbens,et al.  Efficient Inference of Average Treatment Effects in High Dimensions via Approximate Residual Balancing , 2016 .

[36]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[37]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[38]  M. J. van der Laan,et al.  The International Journal of Biostatistics Targeted Maximum Likelihood Learning , 2011 .

[39]  Nan Jiang,et al.  Doubly Robust Off-policy Value Evaluation for Reinforcement Learning , 2015, ICML.

[40]  Yoav Freund,et al.  Boosting: Foundations and Algorithms , 2012 .

[41]  Léon Bottou,et al.  Towards Principled Methods for Training Generative Adversarial Networks , 2017, ICLR.

[42]  John Langford,et al.  Doubly Robust Policy Evaluation and Learning , 2011, ICML.

[43]  Léon Bottou,et al.  Stochastic Gradient Descent Tricks , 2012, Neural Networks: Tricks of the Trade.

[44]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[45]  G. Imbens,et al.  Bias-Corrected Matching Estimators for Average Treatment Effects , 2002 .

[46]  D. Bertsimas,et al.  Best Subset Selection via a Modern Optimization Lens , 2015, 1507.03133.

[47]  K. Hornik,et al.  Model-Based Recursive Partitioning , 2008 .

[48]  Susan Athey,et al.  The Econometrics of Randomized Experiments , 2016, 1607.00698.

[49]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[50]  Rosa L. Matzkin Restrictions of economic theory in nonparametric methods , 1994 .

[51]  J. Robins,et al.  Semiparametric Efficiency in Multivariate Regression Models with Missing Data , 1995 .

[52]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[53]  H. Chipman,et al.  BART: Bayesian Additive Regression Trees , 2008, 0806.3286.

[54]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[55]  Steven L. Scott,et al.  A modern Bayesian look at the multi-armed bandit , 2010 .

[56]  Toru Kitagawa,et al.  Who should be Treated? Empirical Welfare Maximization Methods for Treatment Choice , 2015 .

[57]  Wei Chu,et al.  An unbiased offline evaluation of contextual bandit algorithms with generalized linear models , 2011 .

[58]  R. Tibshirani,et al.  Extended Comparisons of Best Subset Selection, Forward Stepwise Selection, and the Lasso , 2017, 1707.08692.

[59]  Sören R. Künzel,et al.  Meta-learners for Estimating Heterogeneous Treatment Effects using Machine Learning , 2017 .

[60]  W. R. Thompson ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[61]  Serena Ng,et al.  Principal Components and Regularized Estimation of Factor Models , 2017, 1708.08137.

[62]  Xiaohong Chen Chapter 76 Large Sample Sieve Estimation of Semi-Nonparametric Models , 2007 .

[63]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[64]  Sören R. Künzel,et al.  Metalearners for estimating heterogeneous treatment effects using machine learning , 2017, Proceedings of the National Academy of Sciences.

[65]  Christian Hansen,et al.  Double/Debiased/Neyman Machine Learning of Treatment Effects , 2017, 1701.08687.

[66]  Hal R. Varian,et al.  Big Data: New Tricks for Econometrics , 2014 .

[67]  G. Imbens,et al.  Balancing, Regression, Difference-in-Differences and Synthetic Control Methods: A Synthesis , 2016, 1610.07748.

[68]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[69]  Sanjog Misra,et al.  Deep Neural Networks for Estimation and Inference: Application to Causal Effects and Other Semiparametric Estimands , 2018, Econometrica.

[70]  Alan J. Miller Subset Selection in Regression , 1992 .

[71]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[72]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[73]  Andriy Burkov,et al.  The Hundred-Page Machine Learning Book , 2019 .

[74]  H. Bierens Advances in Econometrics: Kernel estimators of regression functions , 1987 .

[75]  Julie Tibshirani,et al.  Local Linear Forests , 2018, J. Comput. Graph. Stat..

[76]  Matias D. Cattaneo,et al.  Econometric Methods for Program Evaluation , 2018, Annual Review of Economics.

[77]  V. Chernozhukov,et al.  Estimation and Inference about Heterogeneous Treatment Effects in High-Dimensional Dynamic Panels. , 2017 .

[78]  Susan Athey,et al.  Recursive partitioning for heterogeneous causal effects , 2015, Proceedings of the National Academy of Sciences.

[79]  John Langford,et al.  Doubly Robust Policy Evaluation and Optimization , 2014, ArXiv.

[80]  Jie Liu,et al.  Modeling Consumer Preferences and Price Sensitivities from Large-Scale Grocery Shopping Transaction Logs , 2017, WWW.

[81]  James M. Robins,et al.  Double/De-Biased Machine Learning of Global and Local Parameters Using Regularized Riesz Representers , 2018 .

[82]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[83]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[84]  J. Kmenta Mostly Harmless Econometrics: An Empiricist's Companion , 2010 .

[85]  Esther Duflo,et al.  Generic Machine Learning Inference on Heterogenous Treatment Effects in Randomized Experiments , 2017 .

[86]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[87]  Ethem Alpaydin,et al.  Introduction to machine learning , 2004, Adaptive computation and machine learning.

[88]  Koray Kavukcuoglu,et al.  Learning word embeddings efficiently with noise-contrastive estimation , 2013, NIPS.

[89]  Susan Athey,et al.  Ensemble Methods for Causal Effects in Panel Data Settings , 2019, AEA Papers and Proceedings.

[90]  Andrew McCallum,et al.  Word Representations via Gaussian Embedding , 2014, ICLR.

[91]  Philip S. Thomas,et al.  Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning , 2016, ICML.

[92]  Stefan Wager,et al.  Estimation and Inference of Heterogeneous Treatment Effects using Random Forests , 2015, Journal of the American Statistical Association.

[93]  Jeffrey M. Woodbridge Econometric Analysis of Cross Section and Panel Data , 2002 .

[94]  A. V. D. Vaart,et al.  Asymptotic Statistics: U -Statistics , 1998 .

[95]  Yee Whye Teh,et al.  A fast and simple algorithm for training neural probabilistic language models , 2012, ICML.

[96]  H. Chipman,et al.  Bayesian Additive Regression Trees , 2006 .

[97]  Geoffrey E. Hinton,et al.  Three new graphical models for statistical language modelling , 2007, ICML '07.

[98]  J. Friedman Stochastic gradient boosting , 2002 .

[99]  K. Do,et al.  Efficient and Adaptive Estimation for Semiparametric Models. , 1994 .

[100]  Oren Barkan,et al.  Bayesian Neural Word Embedding , 2016, AAAI.

[101]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001 .

[102]  Rosa L. Matzkin NONPARAMETRIC IDENTIFICATION , 2012 .

[103]  Gary Chamberlain,et al.  Econometrics and decision theory , 2000 .

[104]  David M. Blei,et al.  SHOPPER: A Probabilistic Model of Consumer Choice with Substitutes and Complements , 2017, The Annals of Applied Statistics.

[105]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[106]  David M. Blei,et al.  Scalable Recommendation with Hierarchical Poisson Factorization , 2015, UAI.

[107]  C. Morris Parametric Empirical Bayes Inference: Theory and Applications , 1983 .

[108]  P. Holland Statistics and Causal Inference , 1985 .

[109]  J. Robins,et al.  Double/Debiased Machine Learning for Treatment and Structural Parameters , 2017 .

[110]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[111]  G. Imbens,et al.  Matrix Completion Methods for Causal Panel Data Models , 2017, Journal of the American Statistical Association.

[112]  M. Kosorok,et al.  Balanced Policy Evaluation and Learning for Right Censored Data. , 2019, 1911.05728.

[113]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[114]  Sanjog Misra,et al.  Deep Neural Networks for Estimation and Inference , 2018, Econometrica.

[115]  C. O’Brien Statistical Learning with Sparsity: The Lasso and Generalizations , 2016 .

[116]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[117]  Zhengyuan Zhou,et al.  Balanced Linear Contextual Bandits , 2018, AAAI.

[118]  Jens Hainmueller,et al.  Comparative Politics and the Synthetic Control Method , 2014 .

[119]  Robert Tibshirani,et al.  Spectral Regularization Algorithms for Learning Large Incomplete Matrices , 2010, J. Mach. Learn. Res..

[120]  J. Robins,et al.  Double/de-biased machine learning using regularized Riesz representers , 2018 .

[121]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[122]  Matt Taddy,et al.  Text As Data , 2017, Journal of Economic Literature.

[123]  Mohsen Bayati,et al.  Online Decision-Making with High-Dimensional Covariates , 2015 .

[124]  Yoshua Bengio,et al.  Neural Probabilistic Language Models , 2006 .

[125]  G. Imbens,et al.  Approximate residual balancing: debiased inference of average treatment effects in high dimensions , 2016, 1604.07125.

[126]  Susan Athey,et al.  Counterfactual inference for consumer choice across many product categories , 2019, Quantitative Marketing and Economics.

[127]  F. Dias,et al.  Determining the number of factors in approximate factor models with global and group-specific factors , 2008 .

[128]  Omer Levy,et al.  Neural Word Embedding as Implicit Matrix Factorization , 2014, NIPS.

[129]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[130]  Susan Athey,et al.  Estimation Considerations in Contextual Bandits , 2017, ArXiv.

[131]  Sanjeev Arora,et al.  RAND-WALK: A Latent Variable Model Approach to Word Embeddings , 2015 .