论文信息 - Clustering Individuals in Non-vector Data and Predicting: A Novel Model-Based Approach

Clustering Individuals in Non-vector Data and Predicting: A Novel Model-Based Approach

In many applications, data is non-vector in nature. For example, one might have transaction data from a dialup access system, where each customer has an observed time-series of dialups which are different on start time and dialup duration from customer to customer. It’s difficult to convert this type of data to a vector form, so that the existing algorithms oriented on vector data [5] are hard to cluster the customers with their dialup events. This paper presents an efficient model-based algorithm to cluster individuals whose data is non-vector in nature. Then we evaluate on a large data set of dialup transaction, in order to show that this algorithm is fast and scalable for clustering, and accurate for prediction. At the same time, we compare this algorithm with vector clustering algorithm by predicting accuracy, to show that the former is fitter for non-vector data than the latter.

Jianmin Wang | Jia-Guang Sun | Deyi Li | Kedong Luo

[1] Padhraic Smyth,et al. Visualization of navigation patterns on a Web site using model-based clustering , 2000, KDD '00.

[2] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[3] Christos Faloutsos,et al. Efficient Similarity Search In Sequence Databases , 1993, FODO.

[4] Jiawei Han,et al. Data Mining: Concepts and Techniques , 2000 .