An Analysis of Quasi-complete Binary Data with Logistic Models: Applications to Alcohol Abuse Data

This paper examines the issues surrounding the analysis of quasi-complete binary data using logistic regression models with the aid of some popular statistical software programs. Results from three procedures in SAS (LOGISTIC, CATMOD and GENMOD) and the pull-down menu in SPSS were examined. The review was conducted in response to an observation that some users of these procedures do not always independently account for data irregularities encountered when interpreting the computer results. This may be due partly to the fact that the information provided by some statistical software packages may not be sufficient for the user to make informed decisions regarding the results. The dataset that motivated this review came from a substance abuse treatment outcome study. Thirty subjects were followed up to determine the proportion that relapsed and to determine the factors that may predict the relapse. Binary logistic regression models were used to determine the predictors of a relapse. Results showed that there was quasi-complete separation of the data and as such the interpretation is limited. SAS and its procedures in the analysis of quasi-complete data gave very large standard errors, computed more iterations, and provided a useful warning for researchers regarding the configuration of data. In contrast, SPSS provided estimates with smaller standard errors, and did not necessarily provide warning for researchers of the data configuration. Thus researchers who make use of statistical softwares without the knowledge of the iterative procedures used by the statistical package should be aware of the possibility of erroneous conclusions as a consequence when analyzing quasi-complete or complete data.