Building on the Arules Infrastructure for Analyzing Transaction Data with R

The free and extensible statistical computing environment R with its enormous number of extension packages already provides many state-of-the-art techniques for data analysis. Support for association rule mining, a popular exploratory method which can be used, among other purposes, for uncovering cross-selling opportunities in market baskets, has become available recently with the R extension package arules. After a brief introduction to transaction data and association rules, we present the formal framework implemented in arules and demonstrate how clustering and association rule mining can be applied together using a market basket data set from a typical retailer. This paper shows that implementing a basic infrastructure with formal classes in R provides an extensible basis which can very efficiently be employed for developing new applications (such as clustering transactions) in addition to association rule mining.

[1]  Kurt Hornik,et al.  Introduction to arules – A computational environment for mining association rules and frequent item sets , 2009 .

[2]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[3]  Chun Zhang,et al.  Storing and querying ordered XML using a relational database system , 2002, SIGMOD '02.

[4]  William Frawley,et al.  Knowledge Discovery in Databases , 1991 .

[5]  Michael J. A. Berry,et al.  Data mining techniques - for marketing, sales, and customer support , 1997, Wiley computer publishing.

[6]  Philip S. Yu,et al.  Finding Localized Associations in Market Basket Data , 2002, IEEE Trans. Knowl. Data Eng..

[7]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[8]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[9]  Kurt Hornik,et al.  A CLUE for CLUster Ensembles , 2005 .

[10]  Gregory Piatetsky-Shapiro,et al.  Discovery, Analysis, and Presentation of Strong Rules , 1991, Knowledge Discovery in Databases.

[11]  Joydeep Ghosh,et al.  Distance based clustering of association rules , 1999 .

[12]  Christian Borgelt,et al.  EFFICIENT IMPLEMENTATIONS OF APRIORI AND ECLAT , 2003 .

[13]  Kurt Hornik,et al.  Introduction to arules — Mining Association Rules and Frequent Item Sets , 2006 .

[14]  Gary J. Russell,et al.  Perspectives on Multiple Category Choice , 1997 .

[15]  P. Sneath,et al.  Some thoughts on bacterial classification. , 1957, Journal of general microbiology.

[16]  Bart Goethals,et al.  Proceedings of the ICDM 2003 Workshop on Frequent Itemset Mining Implementations (FIMI'03) , 2003 .

[17]  John M. Chambers,et al.  Programming With Data , 1998 .