论文信息 - Big data processing: Is there a framework suitable for economists and statisticians?

Big data processing: Is there a framework suitable for economists and statisticians?

The emerging wave of Big Data applications is flooding all branches of scientific knowledge. Economic and statistical applied research carried out in central banks and policy advising institutions is no exception. In this paper we present one of the most promising platform providing a unifying framework for different researchers willing to harness their knowledge of popular and simple computing environment such as R and Python. Along with their Integrated Development Environment (IDE), these are two of the most used numerical computing framework which are open source, provide built-in capabilities for statistical analysis and include a wide array of user contributed packages for an ample set of analytical tools suitable for different scientific applications. In the Big Data framework, we show how to provide researchers with a suitable programming environment allowing them to tame the intrinsic complexity of a High Performance Computing Cluster. Here we provide few empirical applications based on classical econometric and machine learning modeling.

Giuseppe Bruno | Alberto Falzone | Andrea Luciani | Demetrio Condello

[1] Reynold Xin,et al. SparkR: Scaling R Programs with Spark , 2016, SIGMOD Conference.

[2] Scott Shenker,et al. Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[3] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[4] Randy H. Katz,et al. Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center , 2011, NSDI.

[5] Benjamin Hindman,et al. A Common Substrate for Cluster Computing , 2009, HotCloud.

[6] Tom White,et al. Hadoop: The Definitive Guide , 2009 .