Full information maximum likelihood estimation in factor analysis with a lot of missing values

We consider the problem of full information maximum likelihood (FIML) estimation in a factor analysis model when a majority of the data values are missing. The expectation-maximization (EM) algorithm is often used to find the FIML estimates, in which the missing values on observed variables are included in complete data. However, the EM algorithm has an extremely high computational cost when the number of observations is large and/or plenty of missing values are involved. In this paper, we propose a new algorithm that is based on the EM algorithm but that efficiently computes the FIML estimates. A significant improvement in the computational speed is realized by not treating the missing values on observed variables as a part of complete data. Our algorithm is applied to a real data set collected from a Web questionnaire that asks about first impressions of human; almost $90\%$ of the data values are missing. When there are many missing data values, it is not clear if the FIML procedure can achieve good estimation accuracy even if the number of observations is large. In order to investigate this, we conduct Monte Carlo simulations under a wide variety of sample sizes.