A novel approach of empirical likelihood with massive data

Statistical analysis of large datasets is a challenge because of the limitation of computing devices' memory and excessive computation time. Divide and Conquer (DC) algorithm is an effective solution path, but the DC algorithm still has limitations for statistical inference. Empirical likelihood is an important semiparametric and nonparametric statistical method for parameter estimation and statistical inference, and the estimating equation builds a bridge between empirical likelihood and traditional statistical methods, which makes empirical likelihood widely used in various traditional statistical models. In this paper, we propose a novel approach to address the challenges posed by empirical likelihood with massive data, which is called split sample mean empirical likelihood(SSMEL), our approach provides a unique perspective for sovling big data problem. We show that the SSMEL estimator has the same estimation efficiency as the empirical likelihood estimator with the full dataset, and maintains the important statistical property of Wilks' theorem, allowing our proposed approach to be used for statistical inference. The effectiveness of the proposed approach is illustrated using simulation studies and real data analysis.

[1]  Hansheng Wang,et al.  Subsampling and Jackknifing: A Practically Convenient Solution for Large Data Analysis With Limited Computational Resources , 2023, Statistica Sinica.

[2]  Zhouping Li,et al.  Distributed estimation with empirical likelihood , 2022, Canadian Journal of Statistics.

[3]  Xueying Chen,et al.  Divide-and-conquer methods for big data analysis , 2021, Wiley StatsRef: Statistics Reference Online.

[4]  Xuejun Ma,et al.  Statistical inference in massive datasets by empirical likelihood , 2020, Computational Statistics.

[5]  Liuhua Peng,et al.  Distributed statistical inference for massive data , 2018, The Annals of Statistics.

[6]  Nicole Lazar,et al.  Split sample empirical likelihood , 2017, Comput. Stat. Data Anal..

[7]  Jianqing Fan,et al.  Communication-Efficient Accurate Statistical Estimation , 2019, Journal of the American Statistical Association.

[8]  Cheng Huang,et al.  A distributed one-step estimator , 2015, Math. Program..

[9]  T. Wu,et al.  High-dimensional empirical likelihood inference , 2018 .

[10]  Jianqing Fan,et al.  DISTRIBUTED TESTING AND ESTIMATION UNDER SPARSE HIGH DIMENSIONAL MODELS. , 2018, Annals of statistics.

[11]  Cheng Yong Tang,et al.  A new scope of penalized empirical likelihood with high-dimensional estimating equations , 2017, The Annals of Statistics.

[12]  Yun Yang,et al.  Communication-Efficient Distributed Statistical Inference , 2016, Journal of the American Statistical Association.

[13]  Jonathan D. Rosenblatt,et al.  On the Optimality of Averaging in Distributed Statistical Learning , 2014, 1407.2724.

[14]  T. Wu,et al.  Nested coordinate descent algorithms for empirical likelihood , 2014 .

[15]  Minge Xie,et al.  A Split-and-Conquer Approach for Analysis of Extraordinarily Large Data , 2014 .

[16]  Chenlei Leng,et al.  Penalized empirical likelihood and growing dimensional general estimating equations , 2012 .

[17]  Runze Li,et al.  Statistical inference in massive data sets , 2012 .

[18]  Ingrid Van Keilegom,et al.  A review on empirical likelihood methods for regression , 2009 .

[19]  Liang Peng,et al.  Effects of data dimension on empirical likelihood , 2009 .

[20]  Bing-Yi Jing,et al.  Jackknife Empirical Likelihood , 2009 .

[21]  Nils Lid Hjort,et al.  Extending the Scope of Empirical Likelihood , 2009, 0904.2949.

[22]  Taisuke Otsu,et al.  Conditional empirical likelihood estimation and inference for quantile regression models , 2008 .

[23]  Yoon-Jae Whang,et al.  SMOOTHED EMPIRICAL LIKELIHOOD METHODS FOR QUANTILE REGRESSION MODELS , 2004, Econometric Theory.

[24]  Wolfgang Karl Härdle,et al.  An empirical likelihood goodness‐of‐fit test for time series , 2003 .

[25]  Whitney K. Newey,et al.  Higher Order Properties of Gmm and Generalized Empirical Likelihood Estimators , 2003 .

[26]  N. Lazar Bayesian empirical likelihood , 2003 .

[27]  Jian Shi,et al.  Empirical Likelihood for Partially Linear Models , 2000 .

[28]  Yuichi Kitamura,et al.  Empirical likelihood methods with weakly dependent processes , 1997 .

[29]  J. Lawless,et al.  Empirical Likelihood and General Estimating Equations , 1994 .

[30]  Art B. Owen,et al.  Empirical Likelihood for Linear Models , 1991 .

[31]  Thomas J. DiCiccio,et al.  Empirical Likelihood is Bartlett-Correctable , 1991 .

[32]  A. Owen Empirical likelihood ratio confidence intervals for a single functional , 1988 .