Mining Frequent Patterns in Print Logs with Semantically Alternative Labels

It is common today for users to print the informative information from webpages due to the popularity of printers and internet. Thus, many web printing tools such as Smart Print and PrintUI are developed for online printing. In order to improve the users' printing experience, the interaction data between users and these tools are collected to form a so-called print log data, where each record is the set of urls selected for printing by a user within a certain period of time. Apparently, mining frequent patterns from these print log data can capture user intentions for other applications, such as printing recommendation and behavior targeting. However, mining frequent patterns by directly using url as item representation in print log data faces two challenges: data sparsity and pattern interpretability. To tackle these challenges, we attempt to leverage delicious api a social bookmarking web service as an external thesaurus to expand the semantics of each url by selecting tags associated with the domain of each url. In this setting, the frequent pattern mining is employed on the tag representation of each url rather than the url or domain representation. With the enhancement of semantically alternative tag representation, the semantics of url is substantially improved, thus yielding the useful frequent patterns. To this end, in this paper we propose a novel pattern mining problem, namely mining frequent patterns with semantically alternative labels, and propose an efficient algorithm named PaSAL Frequent Patterns with Semantically Alternative Labels Mining Algorithm for this problem. Specifically, we propose a new constraint named conflict matrix to purify the redundant patterns to achieve a high efficiency. Finally, we evaluate the proposed algorithm on a real print log data.

[1]  Jiawei Han,et al.  Discovery of Multiple-Level Association Rules from Large Databases , 1995, VLDB.

[2]  Jian Pei,et al.  Constrained frequent pattern mining: a pattern-growth view , 2002, SKDD.

[3]  Johannes Gehrke,et al.  MAFIA: a maximal frequent itemset algorithm for transactional databases , 2001, Proceedings 17th International Conference on Data Engineering.

[4]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[5]  Ping Luo,et al.  Incorporating occupancy into frequent pattern mining for high quality pattern recommendation , 2012, CIKM.

[6]  Luc De Raedt,et al.  Constraint-Based Pattern Set Mining , 2007, SDM.

[7]  ChengXiang Zhai,et al.  Learn from web search logs to organize search results , 2007, SIGIR.

[8]  Chun Zhang,et al.  Storing and querying ordered XML using a relational database system , 2002, SIGMOD '02.

[9]  Jiawei Han,et al.  Semantic annotation of frequent patterns , 2007, TKDD.

[10]  Nicolas Pasquier,et al.  Discovering Frequent Closed Itemsets for Association Rules , 1999, ICDT.

[11]  Jian Pei,et al.  Can we push more constraints into frequent pattern mining? , 2000, KDD '00.

[12]  Mohammed J. Zaki Scalable Algorithms for Association Mining , 2000, IEEE Trans. Knowl. Data Eng..

[13]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[14]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD 2000.

[15]  Ramakrishnan Srikant,et al.  Mining generalized association rules , 1995, Future Gener. Comput. Syst..

[16]  Malcolm P. Atkinson,et al.  Issues Raised by Three Years of Developing PJama: An Orthogonally Persistent Platform for Java , 1999, ICDT.