论文信息 - A study of the application of statistical methods for Big data

A study of the application of statistical methods for Big data

The use of analysis and classification methods for big data is difficult. Several proposals consist in dividing randomly the population into b sub-samples and aggregating the parameters using an estimator based on the average parameters of these selected sub-samples. This paper aims to find a solution that minimizes calculations by selecting a small number b* sub-samples and keeping the same precision. We can apply this approach to the several method to measure its relevance.

Abdallah Abarda | Mohamed Dakkon | Mustapha El Moudden | Samya Tajmouati | Mustapha Esghir

[1] Hong Shu. Big data analytics: six techniques , 2016, Geo spatial Inf. Sci..

[2] A Divided Latent Class analysis for Big Data , 2017, FNC/MobiSPC.

[3] Abdallah Abarda,et al. Probabilistic approach to estimate the risk of being a cybercrime victim , 2015 .

[4] Martin J. Wainwright,et al. Divide and Conquer Kernel Ridge Regression , 2013, COLT.

[5] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[6] Drew A. Linzer,et al. poLCA: An R Package for Polytomous Variable Latent Class Analysis , 2011 .

[7] Sunghae Jun,et al. A Divided Regression Analysis for Big Data , 2015 .

[8] Runze Li,et al. Statistical inference in massive data sets , 2012 .

[9] Geoff Hulten,et al. A General Framework for Mining Massive Data Streams , 2003 .

[10] Han Liu,et al. A PARTIALLY LINEAR FRAMEWORK FOR MASSIVE HETEROGENEOUS DATA. , 2014, Annals of statistics.

[11] Srijan Sengupta,et al. A Subsampled Double Bootstrap for Massive Data , 2015, 1508.01126.

[12] Application of latent class analysis to identify the youth population who risk being cybercrime victim on social networks , 2015 .