WebDocs: a real-life huge transactional dataset

The resulting dataset has a size of about 1; 48GB. It contains exactly 1:692:082 transactions with 5:267:656 distinct items. The maximal length of a transaction is 71:472. Figure 1 plots the number of frequent itemsets as a function of the support threshold, while Figure 2 shows a bitmap representing the horizontal dataset, where items were sorted by their frequency. Note that to reduce the size of the bitmap, it was obtained by evaluating the number of occurrences of a group of items having subsequent Id’s in a subset of subsequent transactions and assigning a level of gray proportional to such count. 5 10 15 20 25 30 35 40 10 1 10 2 10 3 10 4 10 5 10 6 10 7 FREQUENT ITEMSETS IN THE WEBDOCS DATASET