A LIKELIHOOD BASED GROUPING METHOD FOR MULTIDIMENSIONAL OBSERVATIONS: AN INTERACTIVE METHOD USED ON LARGE TRANSPORTATION DATA BASES

A method for data clustering that results in homogeneous subordinate data populations is discussed. It uses Fisher's likelihood theory to define the distances between groups of data. Its dissimilarity measure therefore depends on the probability density function of observations. Three different dissimilarities, each based on different data distributions and each consistent with the likelihood theory, are treated. Advantages of the method include detection of dependencies between variables; curtailment of cluster and segmentation techniques; and each probability function has its own optimal (dis)similarity measure,derived using the likelihood theory. The method conserves all the advantages of hierarchical divisive techniques, which makes it suitable for analysis of large surveys. It was developed for transportation science, but is suitable for other research involving large data sets.