Clustering Individuals in Non-vector Data and Predicting: A Novel Model-Based Approach

In many applications, data is non-vector in nature. For example, one might have transaction data from a dialup access system, where each customer has an observed time-series of dialups which are different on start time and dialup duration from customer to customer. It’s difficult to convert this type of data to a vector form, so that the existing algorithms oriented on vector data [5] are hard to cluster the customers with their dialup events. This paper presents an efficient model-based algorithm to cluster individuals whose data is non-vector in nature. Then we evaluate on a large data set of dialup transaction, in order to show that this algorithm is fast and scalable for clustering, and accurate for prediction. At the same time, we compare this algorithm with vector clustering algorithm by predicting accuracy, to show that the former is fitter for non-vector data than the latter.