Probabilistic modeling of transaction data with applications to profiling, visualization, and prediction

Transaction data is ubiquitous in data mining applications. Examples include market basket data in retail commerce, telephone call records in telecommunications, and Web logs of individual page-requests at Web sites. Profiling consists of using historical transaction data on individuals to construct a model of each individual's behavior. Simple profiling techniques such as histograms do not generalize well from sparse transaction data. In this paper we investigate the application of probabilistic mixture models to automatically generate profiles from large volumes of transaction data. In effect, the mixture model represents each individual's behavior as a linear combination of "basis transactions." We evaluate several variations of the model on a large retail transaction data set and show that the proposed model provides improved predictive power over simpler histogram-based techniques, as well as being relatively scalable, interpretable, and flexible. In addition we point to applications in outlier detection, customer ranking, interactive visualization, and so forth. The paper concludes by comparing and relating the proposed framework to other transaction-data modeling techniques such as association rules.