Weighted-Frequent Itemset Refinement Methodology (W-FIRM) of Usage Clusters

Due to information overload on the Internet a large number of systems have been developed for extracting user behavior. This paper presents mining of Frequent Itemsets and refinement of usage clusters for web based applications. Here a particular case is under consideration where sessions in a cluster are in abundance, consequently leading to a very large number of not-so interesting recommendations for the user. To solve such problems we intend to refine clusters on the basis of Weighted Frequent Itemsets that in turn help to generate improved quality refined clusters. In the proposed work, Frequent Itemsets are sets of web pages that occur in sessions more than a given threshold known as the minimum support. Motivation for adapting Frequent Itemsets for refinement is the demand of dimensionality reduction. Experimental results show that the cluster quality using the proposed approach is better than the existing approaches (DBS, 2011 and HITS, 2010). After getting refined clusters the same can be used for number of applications such as Web Personalization, improvement in Web Site Structure, Analysis of Users’ Online Behavior and the services of a Recommender System.

[1]  Anil K. Jain Data clustering: 50 years beyond K-means , 2010, Pattern Recognit. Lett..

[2]  Philip S. Yu,et al.  Data Mining: An Overview from a Database Perspective , 1996, IEEE Trans. Knowl. Data Eng..

[3]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Vir V. Phoha,et al.  Web user clustering from access log using belief function , 2001, K-CAP '01.

[5]  John J. Leggett,et al.  WFIM: Weighted Frequent Itemset Mining with a weight range and a minimum weight , 2005, SDM.

[6]  Wei Li,et al.  Navigation pattern discovery on Web site based on the distance between sequences , 2011, 2011 2nd International Conference on Artificial Intelligence, Management Science and Electronic Commerce (AIMSEC).

[7]  Ron Kohavi,et al.  WEBKDD 2001 — Mining Web Log Data Across All Customers Touch Points , 2002, Lecture Notes in Computer Science.

[8]  Ujjwal Maulik,et al.  Performance Evaluation of Some Clustering Algorithms and Validity Indices , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Kevin S. McCurley,et al.  Untangling compound documents on the web , 2003, HYPERTEXT '03.

[10]  Reza Azmi,et al.  A webpage similarity measure for web sessions clustering using sequence alignment , 2011, 2011 International Symposium on Artificial Intelligence and Signal Processing (AISP).

[11]  Cyrus Shahabi,et al.  Knowledge discovery from users Web-page navigation , 1997, Proceedings Seventh International Workshop on Research Issues in Data Engineering. High Performance Database Management for Large-Scale Applications.

[12]  Cai-Nicolas Ziegler,et al.  On Recommender Systems , 2013 .

[13]  Vipin Kumar,et al.  Chameleon: Hierarchical Clustering Using Dynamic Modeling , 1999, Computer.

[14]  Wilfred Ng,et al.  A survey on algorithms for mining frequent itemsets over data streams , 2008, Knowledge and Information Systems.

[15]  C. Lee Giles,et al.  Self-Organization and Identification of Web Communities , 2002, Computer.

[16]  Padhraic Smyth,et al.  Model-Based Clustering and Visualization of Navigation Patterns on a Web Site , 2003, Data Mining and Knowledge Discovery.

[17]  J. Bobadilla,et al.  Recommender systems survey , 2013, Knowl. Based Syst..

[18]  Jaideep Srivastava,et al.  Data Preparation for Mining World Wide Web Browsing Patterns , 1999, Knowledge and Information Systems.

[19]  Yuhua Liu,et al.  Mining global frequent subtrees , 2010, 2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery.

[20]  Maciej Zięba,et al.  Services Recommendation in Systems Based on Service Oriented Architecture by Applying Modified ROCK Algorithm , 2010, NDT.

[21]  Elena Baralis,et al.  An Efficient Itemset Mining Approach for Data Streams , 2011, KES.

[22]  Fionn Murtagh,et al.  Weighted Association Rule Mining using weighted support and significance framework , 2003, KDD '03.

[23]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[24]  Arindam Banerjee,et al.  Clickstream clustering using weighted longest common subsequences , 2001 .

[25]  Christian Borgelt,et al.  An implementation of the FP-growth algorithm , 2005 .

[26]  Farnoush Banaei Kashani,et al.  A Framework for Efficient and Anonymous Web Usage Mining Based on Client-Side Tracking , 2001, WEBKDD.

[27]  Pavel Berkhin,et al.  A Survey of Clustering Data Mining Techniques , 2006, Grouping Multidimensional Data.

[28]  Frank Nielsen,et al.  On weighting clustering , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Lakhmi C. Jain,et al.  Knowledge-Based Intelligent Information and Engineering Systems , 2004, Lecture Notes in Computer Science.

[30]  Peter Svec,et al.  Data preprocessing evaluation for web log mining: reconstruction of activities of a web visitor , 2010, ICCS.

[31]  Giovanna Castellano,et al.  Similarity-Based Fuzzy Clustering for User Profiling , 2007, 2007 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Workshops.

[32]  Jaideep Srivastava,et al.  Automatic personalization based on Web usage mining , 2000, CACM.

[33]  Hendrik Blockeel,et al.  Web mining research: a survey , 2000, SKDD.

[34]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[35]  Inderjit S. Dhillon,et al.  Efficient Clustering of Very Large Document Collections , 2001 .

[36]  Pierre Baldi,et al.  Modeling the Internet and the Web: Probabilistic Method and Algorithms , 2002 .

[37]  Paul S. Bradley,et al.  Scaling Clustering Algorithms to Large Databases , 1998, KDD.

[38]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[39]  Denis Laurendeau,et al.  Award winning papers from the 19th International Conference on Pattern Recognition (ICPR) , 2010, Pattern Recognit. Lett..

[40]  Marios Hadjieleftheriou,et al.  Methods for finding frequent items in data streams , 2010, The VLDB Journal.

[41]  Jaideep Srivastava,et al.  Web usage mining: discovery and applications of usage patterns from Web data , 2000, SKDD.

[42]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[43]  Ada Wai-Chee Fu,et al.  Mining association rules with weighted items , 1998, Proceedings. IDEAS'98. International Database Engineering and Applications Symposium (Cat. No.98EX156).

[44]  Charles Elkan,et al.  Using the Triangle Inequality to Accelerate k-Means , 2003, ICML.

[45]  Bamshad Mobasher,et al.  Discovery of Aggregate Usage Profiles for Web Personalization , 2000 .

[46]  Unil Yun,et al.  Efficient mining of weighted interesting patterns with a strong weight and/or support affinity , 2007, Inf. Sci..

[47]  Makoto Takizawa,et al.  A Survey on Clustering Algorithms for Wireless Sensor Networks , 2010, 2010 13th International Conference on Network-Based Information Systems.

[48]  Philip S. Yu,et al.  WAR: Weighted Association Rules for Item Intensities , 2004, Knowledge and Information Systems.

[49]  Don-Lin Yang,et al.  GLFMiner: Global and local frequent pattern mining with temporal intervals , 2010, 2010 5th IEEE Conference on Industrial Electronics and Applications.

[50]  Cai-Nicolas Ziegler Social Web Artifacts for Boosting Recommenders - Theory and Implementation , 2013, Studies in Computational Intelligence.

[51]  Shu-Hsien Liao,et al.  Data mining techniques and applications - A decade review from 2000 to 2011 , 2012, Expert Syst. Appl..

[52]  Bart Goethals,et al.  Frequent Set Mining , 2010, Data Mining and Knowledge Discovery Handbook.

[53]  Pierre Sens,et al.  Stream Processing of Healthcare Sensor Data: Studying User Traces to Identify Challenges from a Big Data Perspective , 2015, ANT/SEIT.

[54]  Sergio Greco,et al.  Web Communities: Models and Algorithms , 2004, World Wide Web.

[55]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[56]  Osmar R. Zaïane,et al.  Clustering Web sessions by sequence alignment , 2002, Proceedings. 13th International Workshop on Database and Expert Systems Applications.

[57]  Fan-Chen Tseng,et al.  Mining frequent itemsets in large databases: The hierarchical partitioning approach , 2013, Expert Syst. Appl..