Statistical and Computational Needs for Big Data Challenges

The traditional way of formatting information from transactional systems to make them available for “statistical processing” does not work in a situation where data is arriving in huge volumes from diverse sources, and where even the formats could be changing. Faced with this volume and diversification, it is essential to develop techniques to make best use of all of these stocks in order to extract the maximum amount of information and knowledge. Traditional analysis methods have been based largely on the assumption that statisticians can work with data within the confines of their own computing environment. But the growth of the amounts of data is changing that paradigm, especially which ride of the progress in computational data analysis. This chapter builds upon sources but also goes further in the examination to answer this question: What needs to be done in this area to deal with big data challenges? Statistical and Computational Needs for Big Data Challenges

[1]  Hans W. Gottinger The Internet, Data Analytics and Big Data , 2017 .

[2]  F. Liang,et al.  A Resampling-Based Stochastic Approximation Method for Analysis of Large Geostatistical Data , 2013 .

[3]  Ping Ma,et al.  Smoothing spline ANOVA for super-large samples: Scalable computation via rounding parameters , 2016, 1602.05208.

[4]  Sankar K. Pal,et al.  Data mining in soft computing framework: a survey , 2002, IEEE Trans. Neural Networks.

[5]  F. Götze,et al.  RESAMPLING FEWER THAN n OBSERVATIONS: GAINS, LOSSES, AND REMEDIES FOR LOSSES , 2012 .

[6]  Xiaoxiao Sun,et al.  Leveraging for big data regression , 2015 .

[7]  T. Harford,et al.  Big data: A big mistake? , 2014 .

[8]  Edward I. George,et al.  Bayes and big data: the consensus Monte Carlo algorithm , 2016, Big Data and Information Theory.

[9]  Han Liu,et al.  Challenges of Big Data Analysis. , 2013, National science review.

[10]  Marc Ratkovic,et al.  Estimating treatment effect heterogeneity in randomized program evaluation , 2013, 1305.5682.

[11]  Ruibin Xi,et al.  Aggregated estimating equation estimation , 2011 .

[12]  E. F. Codd,et al.  A relational model of data for large shared data banks , 1970, CACM.

[13]  M. Tatullo,et al.  Meniscal tears left in situ during anatomic single bundle anterior cruciate ligament reconstruction. , 2014, European review for medical and pharmacological sciences.

[14]  Jing Wu,et al.  Online Updating of Statistical Inference in the Big Data Setting , 2015, Technometrics.

[15]  D. Green,et al.  Modeling Heterogeneous Treatment Effects in Survey Experiments with Bayesian Additive Regression Trees , 2012 .

[16]  Fujimaki Ryohei,et al.  The Most Advanced Data Mining of the Big Data Era , 2012 .

[17]  Michael Mattioli,et al.  Big data, bigger dilemmas: A critical review , 2015, J. Assoc. Inf. Sci. Technol..

[18]  Avita Katal,et al.  Big data: Issues, challenges, tools and Good practices , 2013, 2013 Sixth International Conference on Contemporary Computing (IC3).

[19]  John W. Patty,et al.  Analyzing Big Data: Social Choice and Measurement , 2014, PS: Political Science & Politics.

[20]  G. Gregorio,et al.  Genetic Advances in Adapting Rice to a Rapidly Changing Climate , 2012 .

[21]  D. Boyd,et al.  CRITICAL QUESTIONS FOR BIG DATA , 2012 .

[22]  M. Zeleny Management support systems: Towards integrated knowledge management , 1987 .

[23]  Xiangyu Wang,et al.  Parallelizing MCMC with Random Partition Trees , 2015, NIPS.

[24]  Michael I. Jordan,et al.  Computational and statistical tradeoffs via convex relaxation , 2012, Proceedings of the National Academy of Sciences.

[25]  R. Ackoff From Data to Wisdom , 2014 .

[26]  Changbao Wu,et al.  Jackknife, Bootstrap and Other Resampling Methods in Regression Analysis , 1986 .

[27]  Purnamrita Sarkar,et al.  A scalable bootstrap for massive data , 2011, 1112.5016.

[28]  K. Cukier,et al.  The Rise of Big Data , 2013 .

[29]  B. Efron Bootstrap Methods: Another Look at the Jackknife , 1979 .