Testing for dependence in multivariate probit models

SUMMARY A multivariate probit model is considered and the Lagrange multiplier or score statistic for testing independence is derived. The limiting distribution of the statistic takes a simple form under the null hypothesis and for local alternatives. The statistic is a natural generalization of Pearson's chi-squared for a 2 x 2 table. An example is given. Some key word8: Asymptotic test; Lagrange multiplier test; Score test. Binary observations are frequently modelled as occurring according to the sign of an underlying continuous variable, for example the observations di might depend on unobserved yi according to di = 1 if yi > 0 and di = 0 if yi < 0 for i = 1, ...,N observations. With yi normally distributed with mean xi /3 and variance one the model is a probit model. Here xi is a 1 x k vector of observations on values of exogenous variables and /3 is a k x 1 parameter vector. If sequences of observations are available for each i, so that the observations are di, the natural extension of the univariate probit model is multivariate probit. The unobservable T x 1 random vector yi is assumed normally distributed with tth element having mean xit ft. The correlation matrix for yi is given by R, say. Unless R = I, so that the elements of yi are uncorrelated, this model is substantially more difficult to estimate efficiently than the univariate model, as multinormal probabilities enter the likelihood function. Throughout the paper asymptotics let N increase but keep T fixed. This model is discussed by Ashford & Sowden (1970). This paper proposes a test of the hypothesis that R = I. The test proposed is the 'score' or Lagrange multiplier test which is asymptotically equivalent to the Wald and likelihood ratio tests but which does not require calculation of unrestricted estimates. This test is discussed by Silvey (1959), Rao (1973, p. 417) and Cox & Hinkley (1974, p. 315). Tests of the hypothesis that R = I are useful for several reasons: (i) calculated probabilities of sequences under the assumption that R = I will be wrong if R *I; (ii) the univariate probit estimates of the fit will be asymptotically efficient if R = I, and furthermore the estimates of the different fit will be asymptotically uncorrelated; however, if R 4= I then the estimates and their standard errors will be consistent, but the estimates will be correlated between time periods so difficulties