论文信息 - Econometric modeling of panel data using parallel computing with Apache Spark

Econometric modeling of panel data using parallel computing with Apache Spark

The aim of this article is to provide a method for determining the fixed effects estimators using MapReduce programming model implemented in Apache Spark. From many known algorithms two common approaches were exploited: the within transformation and least squares dummy variables method (LSDV). Efficiency of the computations was demonstrated by solving a specially crafted example for sample data. Based on theoretical analysis and computer experiments it can be stated that Apache Spark is an efficient tool for modeling panel data especially if it comes to Big Data.

Michał Bernardelli | Michał Bernardelli

[1] C. Cieszewski,et al. Base-Age Invariance Properties of Two Techniques for Estimating the Parameters of Site Index Models , 2006 .

[2] Zhehui Luo,et al. Fixed effects, random effects and GEE: What are the differences? , 2009, Statistics in medicine.

[3] Cheng Hsiao,et al. Analysis of Panel Data , 1987 .

[4] David R. Kincaid,et al. Numerical mathematics and computing , 1980 .

[5] Jeffrey M. Wooldridge,et al. Introductory Econometrics: A Modern Approach , 1999 .

[6] Muthu Dayalan,et al. MapReduce : Simplified Data Processing on Large Cluster , 2018 .

[7] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[8] P. Diggle. Analysis of Longitudinal Data , 1995 .

[9] M. Arellano. Panel Data Econometrics , 2002 .

[10] Chris J. Cieszewski,et al. The stand dynamics of lodgepole pine , 1988 .

[11] Jeffrey D. Ullman. Designing good MapReduce algorithms , 2012, XRDS.