On Binscatter

Binscatter is very popular in applied microeconomics. It provides a flexible, yet parsimonious way of visualizing and summarizing large data sets in regression settings, and it is often used for informal evaluation of substantive hypotheses such as linearity or monotonicity of the regression function. This paper presents a foundational, thorough analysis of binscatter: we give an array of theoretical and practical results that aid both in understanding current practices (i.e., their validity or lack thereof) and in offering theory-based guidance for future applications. Our main results include principled number of bins selection, confidence intervals and bands, hypothesis tests for parametric and shape restrictions of the regression function, and several other new methods, applicable to canonical binscatter as well as higher-order polynomial, covariate-adjusted and smoothness-restricted extensions thereof. In particular, we highlight important methodological problems related to covariate adjustment methods used in current practice. We also discuss extensions to clustered data. Our results are illustrated with simulated and real data throughout. Companion general-purpose software packages for \texttt{Stata} and \texttt{R} are provided. Finally, from a technical perspective, new theoretical results for partitioning-based series estimation are obtained that may be of independent interest.

[1]  Richard K. Crump,et al.  Binscatter Regressions , 2019, 1902.09615.

[2]  Max H. Farrell,et al.  On the Effect of Bias Estimation on Coverage Accuracy in Nonparametric Inference , 2015, Journal of the American Statistical Association.

[3]  Kengo Kato,et al.  Gaussian approximation of suprema of empirical processes , 2012, 1212.6885.

[4]  Marriage , Housing , and Portfolio Choice : A Test of Grossman-Laroque Raj Chetty , 2006 .

[5]  Raj Chetty,et al.  Adjustment Costs, Firm Responses, and Micro vs. Macro Labor Supply Elasticities: Evidence from Danish Tax Records. , 2011, The quarterly journal of economics.

[6]  P. Burman RATES OF CONVERGENCE FOR THE ESTIMATES OF THE OPTIMAL TRANSFORMATIONS OF VARIABLES , 1991 .

[7]  John N. Friedman,et al.  How Does Your Kindergarten Classroom Affect Your Earnings? Evidence from Project Star , 2010, The quarterly journal of economics.

[8]  A. Nobel Histogram regression estimation using data-dependent partitions , 1996 .

[9]  Matias D. Cattaneo,et al.  Inference in Linear Regression Models with Many Covariates and Heteroscedasticity , 2015, Journal of the American Statistical Association.

[10]  V. Ulyanov,et al.  On the accuracy of normal approximation , 1982 .

[11]  Max H. Farrell,et al.  Efficient Estimation of the Dose–Response Function Under Ignorability Using Subclassification on the Covariates , 2011 .

[12]  Jerome H. Friedman,et al.  A Recursive Partitioning Decision Rule for Nonparametric Classification , 1977, IEEE Transactions on Computers.

[13]  E. Fama Foundations of Finance: Portfolio Decisions and Securities Prices , 1978 .

[14]  Michael Stepner Binned Scatterplots: introducing -binscatter- and exploring its applications , 2014 .

[15]  Adam Krzyzak,et al.  A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[16]  Max H. Farrell,et al.  Large sample properties of partitioning-based series estimators , 2018, The Annals of Statistics.

[17]  Kengo Kato,et al.  Some new asymptotic theory for least squares series: Pointwise and uniform results , 2012, 1212.0442.

[18]  Max H. Farrell,et al.  Characteristic-Sorted Portfolios: Estimation and Inference , 2018, Review of Economics and Statistics.

[19]  Sebastian Calonico,et al.  Robust Nonparametric Confidence Intervals for Regression‐Discontinuity Designs , 2014 .

[20]  W. G. Cochran The effectiveness of adjustment by subclassification in removing bias in observational studies. , 1968, Biometrics.

[21]  Matias D. Cattaneo,et al.  ALTERNATIVE ASYMPTOTICS AND THE PARTIALLY LINEAR MODEL WITH MANY REGRESSORS , 2015, Econometric Theory.

[22]  Victor Chernozhukov,et al.  Conditional Quantile Processes Based on Series or Many Regressors , 2011, Journal of Econometrics.

[23]  Raj Chetty,et al.  Salience and Taxation: Theory and Evidence , 2009 .

[24]  Carl de Boor,et al.  A Practical Guide to Splines , 1978, Applied Mathematical Sciences.

[25]  J. Tukey Curves As Parameters, and Touch Estimation , 1961 .

[26]  Joshua D. Angrist,et al.  Mostly Harmless Econometrics: An Empiricist's Companion , 2008 .