Trans2Vec: Learning Transaction Embedding via Items and Frequent Itemsets

Learning meaningful and effective representations for transaction data is a crucial prerequisite for transaction classification and clustering tasks. Traditional methods which use frequent itemsets (FIs) as features often suffer from the data sparsity and high-dimensionality problems. Several supervised methods based on discriminative FIs have been proposed to address these disadvantages, but they require transaction labels, thus rendering them inapplicable to real-world applications where labels are not given. In this paper, we propose an unsupervised method which learns low-dimensional continuous vectors for transactions based on information of both singleton items and FIs. We demonstrate the superior performance of our proposed method in classifying transactions on four datasets compared with several state-of-the-art baselines.

[1]  Minmin Chen,et al.  Efficient Vector Representation for Documents through Corruption , 2017, ICLR.

[2]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[3]  Taisuke Sato,et al.  RP-growth: Top-k Mining of Relevant Patterns with Minimum Support Raising , 2012, SDM.

[4]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[5]  Kun Guo,et al.  Data mining for the online retail industry: A case study of RFM model-based customer segmentation using data mining , 2012 .

[6]  Jian Pei,et al.  CMAR: accurate and efficient classification based on multiple class-association rules , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[7]  Michalis Vazirgiannis,et al.  Text Categorization as a Graph Classification Problem , 2015, ACL.

[8]  Jiawei Han,et al.  Discriminative Frequent Pattern Analysis for Effective Classification , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[9]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[10]  Ju Wang,et al.  Conditional discriminative pattern mining: Concepts and algorithms , 2017, Inf. Sci..

[11]  Wei Luo,et al.  Control Matching via Discharge Code Sequences , 2016, ArXiv.

[12]  Susumu Horiguchi,et al.  Learning to classify short and sparse text & web with hidden topics from large-scale data collections , 2008, WWW.

[13]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[14]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.