Skip pattern analysis for detection of undetermined and inconsistent data

A common problem in clinical survey trials is missing data. Skip patterns are one type of missing data in medical datasets, skipping a respondent over a group of questions that is not relevant to them. Applying any imputation technique to missing values caused by skip patterns may add misinformation. Moreover, skip pattern analysis provides detection of non-applicable data along with undetermined and inconsistent data. The Medical, Epidemiological and Social Aspects of Aging (MESA) questionnaire is responded by a large number of subjects which entails the need of an automated method. Manual methods may not provide reliable results and they are costly. A directed, acyclic graph is generated based on the questionnaire. A graph theory method is proposed to detect each missing data type. The method finds a minimal deletion set of nodes, that are the nodes once deleted, leaves a connected graph behind. The deleted nodes can be considered as noise. The experiments are conducted on a subset of the MESA data and the results show that there are 16.04% of non-applicable data, 7.09% of genuine missing data, 0.61% of undetermined data and 0.015% of inconsistent data. This method can be used for preprocessing the dataset and estimating the noise.