Distributed-based massive processing of activity logs for efficient user modeling in a Virtual Campus

This paper reports on a multi-fold approach for the building of user models based on the identification of navigation patterns in a virtual campus, allowing for adapting the campus’ usability to the actual learners’ needs, thus resulting in a great stimulation of the learning experience. However, user modeling in this context implies a constant processing and analysis of user interaction data during long-term learning activities, which produces huge amounts of valuable data stored typically in server log files. Due to the large or very large size of log files generated daily, the massive processing is a foremost step in extracting useful information. To this end, this work studies, first, the viability of processing large log data files of a real Virtual Campus using different distributed infrastructures. More precisely, we study the time performance of massive processing of daily log files implemented following the master-slave paradigm and evaluated using Cluster Computing and PlanetLab platforms. The study reveals the complexity and challenges of massive processing in the big data era, such as the need to carefully tune the log file processing in terms of chunk log data size to be processed at slave nodes as well as the bottleneck in processing in truly geographically distributed infrastructures due to the overhead caused by the communication time among the master and slave nodes. Then, an application of the massive processing approach resulting in log data processed and stored in a well-structured format is presented. We show how to extract knowledge from the log data analysis by using the WEKA framework for data mining purposes showing its usefulness to effectively build user models in terms of identifying interesting navigation patters of on-line learners. The study is motivated and conducted in the context of the actual data logs of the Virtual Campus of the Open University of Catalonia.

[1]  Richard Bentley,et al.  Basic support for cooperative work on the World Wide Web , 1997, Int. J. Hum. Comput. Stud..

[2]  Ian Witten,et al.  Data Mining , 2000 .

[3]  Fatos Xhafa,et al.  A Grid-Based Approach for Processing Group Activity Log Files , 2004, OTM Workshops.

[4]  Fatos Xhafa,et al.  Enhancing Knowledge Management in Online Collaborative Learning , 2010, Int. J. Softw. Eng. Knowl. Eng..

[5]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[6]  Fatos Xhafa,et al.  Using Bi-clustering Algorithm for Analyzing Online Users Activity in a Virtual Campus , 2010, 2010 International Conference on Intelligent Networking and Collaborative Systems.

[7]  Fatos Xhafa,et al.  A Grid-Aware Implementation for Providing Effective Feedback to On-Line Learning Groups , 2005, OTM Workshops.

[8]  Thomas Deelman,et al.  The development of behavior-based user models for a computer system , 1999 .

[9]  Fatos Xhafa,et al.  Enabling Efficient Real Time User Modeling in On-Line Campus , 2007, User Modeling.

[10]  Fatos Xhafa,et al.  A parallel grid-based implementation for real-time processing of event log data of collaborative applications , 2010, Int. J. Web Grid Serv..

[11]  Victor Ciesielski,et al.  Data Mining of Web Access Logs From an Academic Web Site , 2003, HIS.

[12]  Sungjune Park,et al.  Sequence-based clustering for Web usage mining: A new experimental framework and ANN-enhanced K-means algorithm , 2008, Data Knowl. Eng..

[13]  Fatos Xhafa,et al.  Efficient Enabling of Real Time User Modeling in On-line Campus , 2007 .

[14]  Enric Mor,et al.  User navigational behavior in e-learning virtual environments , 2005, The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05).

[15]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[16]  K Ramya,et al.  Analysis of Users' Web Navigation Behavior using GRPA with Variable Length Markov Chains , 2011 .