MapReduce-based web mining for prediction of web-user navigation

Predicting web user behaviour is typically an application for finding frequent sequence patterns. With the rapid growth of the Internet, a large amount of information is stored in web logs. Traditional frequent-sequence-pattern-mining algorithms are hard pressed to analyse information from within big datasets. In this paper, we propose an efficient way to predict navigation patterns of web users by improving frequent-sequence-pattern-mining algorithms based on the programming model of MapReduce, which can handle huge datasets efficiently. During the experiments, we show that our proposed MapReduce-based algorithm is more efficient than traditional frequent-sequence-pattern-mining algorithms, and by comparing our proposed algorithms with current existed algorithms in web-usage mining, we also prove that using the MapReduce programming model saves time.

[1]  Petra Benkovská,et al.  Web Usage Mining , 2009, Encyclopedia of Database Systems.

[2]  Bing Liu,et al.  Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data , 2006, Data-Centric Systems and Applications.

[3]  Keun Ho Ryu,et al.  An Application of Improved Gap-BIDE Algorithm for Discovering Access Patterns , 2012, Appl. Comput. Intell. Soft Comput..

[4]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[5]  Edward Y. Chang,et al.  Pfp: parallel fp-growth for query recommendation , 2008, RecSys '08.

[6]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[7]  Olfa Nasraoui,et al.  Web data mining: exploring hyperlinks, contents, and usage data , 2008, SKDD.

[8]  Qing He,et al.  Parallel Implementation of Apriori Algorithm Based on MapReduce , 2012, 2012 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing.

[9]  Keun Ho Ryu,et al.  Prediction of Web User Behavior by Discovering Temporal Relational Rules from Web Log Data , 2012, DEXA.

[10]  Osman Hegazy,et al.  AN EFFICIENT IMPLEMENTATION OF APRIORI ALGORITHM BASED ON HADOOP-MAPREDUCE MODEL , 2012 .

[11]  Kwang Deuk Kim,et al.  Application of Closed Gap-Constrained Sequential Pattern Mining in Web Log Data , 2012 .

[12]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[13]  B. S. Chordia,et al.  GROUPING WEB ACCESS SEQUENCES USING SEQUENCE ALIGNMENT METHOD , 2011 .

[14]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[15]  Method Bhupendra S Chordia,et al.  GROUPING WEB ACCESS SEQUENCES USING SEQUENCE ALIGNMENT , 2011 .

[16]  Hendrik Blockeel,et al.  Web mining research: a survey , 2000, SKDD.

[17]  Zhen Liu,et al.  MapReduce as a programming model for association rules algorithm on Hadoop , 2010, The 3rd International Conference on Information Sciences and Interaction Sciences.

[18]  Milind A. Bhandarkar,et al.  MapReduce programming with apache Hadoop , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[19]  Sanjay Ghemawat,et al.  MapReduce: simplified data processing on large clusters , 2008, CACM.

[20]  Ming-Yen Lin,et al.  Apriori-based frequent itemset mining algorithms on MapReduce , 2012, ICUIMC.

[21]  Olfa Nasraoui,et al.  Web Usage Mining , 2011 .

[22]  Ronald C. Taylor An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics , 2010, BMC Bioinformatics.