We propose the use of the latent Markov model in a context of the estimation of multiple causal effects when dealing with observational studies and there are unobserved baseline differences between individuals. The proposed model, tailored for longitudinal data analysis in its basic formulation, has been first introduced by Wiggins (1951) and then formalized in his Ph.D thesis, Wiggins (1955). In Bartolucci et al. (2013) several extensions of the first basic formulation are given and new models have been proposed. The fact that the assumptions encoded by the model may be represented with the help of a path diagram contributes to make such class of models a powerful tool for the analysis of statistical data. In fact, as stated in Pennoni (2014), such models may be seen as built on the foundation of graphical causal models first proposed by Wright (1921) in genetics. Many statistical models tailored for the estimation of the causal effects have been proposed from that period. The potential outcome framework resulted to be one of the most useful tool. However, in a longitudinal setting the latter it is not still well developed as well as for some powerful models developed in the econometric context, see also Romeo (2014). Building on the foundation of the above models and on a recent proposal of Lanza et al. (2013), we introduce a new use of the propensity score weighting (Rosenbaum and Rubin, 1983) when dealing with a multivariate responses observed at multiple time occasions. We show some assumptions which have to be sustainable for the use of the proposed approach in the context of study. The use of the latent Markov model helps to get a reliable estimate of the average causal effect. An interesting feature of the proposed approach is its flexibility given by the adopted parameterization which allows us to deal with any kind of response variable. The model is fitted by a maximum likelihood estimation procedure based on first estimating a multinomial logit model for the probability of taking each type of treatment given suitably chosen pretreatment covariates. Then, a weighted log-likelihood of the LM model, with weights computed on the basis of the estimates computed at the previous step, is maximized so as to obtain final parameter estimates. This second step relies on the EM algorithm (Baum et al., 1970; Dempster et al., 1977) and reliable standard errors for the model parameters are obtained by using a nonparametric bootstrap method (Davison and Hinkley, 1997). The proposed application is particularly suitable to show the model formulation as that it concerns the evaluation of human capital development which is related to a critical period of ∗Presented at the meeting of the FIRB (“Futuro in ricerca” 2012) project “Mixture and latent variable models for causal-inference and analysis of socio-economic data”, Roma (IT), January 01-23, 2015
[1]
D. A. Kenny,et al.
Correlation and Causation
,
1937,
Wilmott.
[2]
D. Rubin,et al.
The central role of the propensity score in observational studies for causal effects
,
1983
.
[3]
Pennoni,et al.
Issues on the Estimation of Latent Variable and Latent Class Models with Social Science Applications
,
2004
.
[4]
P. Games.
Correlation and Causation: A Logical Snafu
,
1990
.
[5]
Stephanie T. Lanza,et al.
Causal Inference in Latent Class Analysis
,
2013,
Structural equation modeling : a multidisciplinary journal.
[6]
Anthony C. Davison,et al.
Bootstrap Methods and Their Application
,
1998
.
[7]
D. Rubin,et al.
Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper
,
1977
.
[8]
Anca Draghici,et al.
Debate on the Multilevel Model of the Human Capital Measurement
,
2014
.
[9]
Antonello Maruotti.
Latent Markov Models for longitudinal data
,
2014
.
[10]
L. Baum,et al.
A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains
,
1970
.