Missing data imputation, matching and other applications of random recursive partitioning

Applications of the random recursive partitioning (RRP) method are described. This method generates a proximity matrix which can be used in non-parametric matching problems such as hot-deck missing data imputation and average treatment effect estimation. RRP is a Monte Carlo procedure that randomly generates non-empty recursive partitions of the data and calculates the proximity between observations as the empirical frequency in the same cell of these random partitions over all the replications. Also, the method in the presence of missing data is invariant under monotonic transformations of the data but no other formal properties of the method are known yet. Therefore, Monte Carlo experiments were conducted in order to explore the performance of the method. A companion software is available as a package for the R statistical environment.

[1]  David W. Hosmer,et al.  Applied Survival Analysis: Regression Modeling of Time-to-Event Data , 2008 .

[2]  Walter Krämer,et al.  Review of Modern applied statistics with S, 4th ed. by W.N. Venables and B.D. Ripley. Springer-Verlag 2002 , 2003 .

[3]  Jeffrey A. Smith,et al.  Does Matching Overcome Lalonde's Critique of Nonexperimental Estimators? , 2000 .

[4]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[5]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[6]  Petra E. Todd,et al.  Matching As An Econometric Evaluation Estimator: Evidence from Evaluating a Job Training Programme , 1997 .

[7]  R. Lalonde Evaluating the Econometric Evaluations of Training Programs with Experimental Data , 1984 .

[8]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[9]  Brian D. Ripley,et al.  Pattern Recognition and Neural Networks , 1996 .

[10]  Rajeev Dehejia Practical propensity score matching: a reply to Smith and Todd , 2005 .

[11]  Samuel E. Buttrey,et al.  Nearest-neighbor classification with categorical variables , 1998 .

[12]  Josef Kittler,et al.  Pattern recognition : a statistical approach , 1982 .

[13]  James J. Heckman,et al.  Characterizing Selection Bias Using Experimental Data , 1998 .

[14]  Paul R. Rosenbaum,et al.  Optimal Matching for Observational Studies , 1989 .

[15]  Giuseppe Porro,et al.  Random Recursive Partitioning: A Matching Method for the Estimation of the Average Treatment Effect , 2006 .

[16]  H. Ozcelik,et al.  Correlation matrix distance, a meaningful measure for evaluation of non-stationary MIMO channels , 2005, 2005 IEEE 61st Vehicular Technology Conference.

[17]  D. Rubin,et al.  Statistical Analysis with Missing Data. , 1989 .