The U. S. Consumer Product Safety Commission staff uses raking to impute unknown or missing values in fire incident reports. In typical analyses, up to six variables are imputed. These variables describe the cause of the fire and the way that it propagates through a building. Typically, 25 to 40 percent of the values of these variables are unknown. All the variables imputed are categorical, with up to 100 values. The full crosstabulation may run 100,000 cells or more, with modal cell count of zero. In this context, raking can become unstable by either failing to converge, or can produce results where some cell counts are less than their original, pre-imputed values. This paper describes some strategies for raking high dimensional sparse tables.
[1]
John R. Jr. Hall,et al.
The national estimates approach to U.S. fire statistics
,
1989
.
[2]
W. Deming,et al.
On a Least Squares Adjustment of a Sampled Frequency Table When the Expected Marginal Totals are Known
,
1940
.
[3]
Nicole A. Lazar,et al.
Statistical Analysis With Missing Data
,
2003,
Technometrics.
[4]
M. Greene,et al.
Study of the effectiveness of the US safety standard for child resistant cigarette lighters
,
2003
.
[5]
D. Hoaglin,et al.
A SAS Macro for Balancing a Weighted Sample
,
2000
.