Transaction data is ubiquitous in data mining applications. Examples include market basket data in retail commerce, telephone call records in telecommunications, and Web logs of individual page-requests at Web sites. Profiling consists of using historical transaction data on individuals to construct a model of each individual's behavior. Simple profiling techniques such as histograms do not generalize well from sparse transaction data. In this paper we investigate the application of probabilistic mixture models to automatically generate profiles from large volumes of transaction data. In effect, the mixture model represents each individual's behavior as a linear combination of "basis transactions." We evaluate several variations of the model on a large retail transaction data set and show that the proposed model provides improved predictive power over simpler histogram-based techniques, as well as being relatively scalable, interpretable, and flexible. In addition we point to applications in outlier detection, customer ranking, interactive visualization, and so forth. The paper concludes by comparing and relating the proposed framework to other transaction-data modeling techniques such as association rules.
[1]
Geert Wets,et al.
A data mining framework for optimal product selection in retail supermarket data: the generalized PROFSET model
,
2000,
KDD '00.
[2]
Joydeep Ghosh,et al.
Value-based customer grouping from large retail data sets
,
2000,
SPIE Defense + Commercial Sensing.
[3]
David Heckerman,et al.
Empirical Analysis of Predictive Algorithms for Collaborative Filtering
,
1998,
UAI.
[4]
Paul F. Lazarsfeld,et al.
Latent Structure Analysis.
,
1969
.
[5]
David Maxwell Chickering,et al.
Dependency Networks for Inference, Collaborative Filtering, and Data Visualization
,
2000,
J. Mach. Learn. Res..
[6]
M. Wedel,et al.
Market Segmentation: Conceptual and Methodological Foundations
,
1997
.
[7]
Stan Lipovetsky,et al.
Latent Variable Models and Factor Analysis
,
2001,
Technometrics.
[8]
F. Krauss.
Latent Structure Analysis
,
1980
.
[9]
Tomasz Imielinski,et al.
Mining association rules between sets of items in large databases
,
1993,
SIGMOD Conference.