Knowledge Discovery from Log Data Analysis in a Multi-source Search System based on Deep Cleaning

In a multi-source search system, understanding users’ interests and behaviour is essential to improve the search and adapt the results according to each user profile. The interesting information characterizing the users can be hidden in large log files, whereas it must be discovered, extracted and analyzed to build an accurate user profile. This paper presents an approach which analyzes the log data of a multi-source search system using the web usage mining techniques. The aim is to capture, model and analyze the behavioural patterns and profiles of users interacting with this system. The proposed approach consists of two major steps, the first step “pre-processing” eliminates the unwanted data from log files based on predefined cleaning rules, and the second step “processing” extracts useful data on user’s previous queries. In addition to the conventional cleaning process that removes irrelevant data from the log file, such as access of multimedia files, error codes and accesses of Web robots, deep cleaning is proposed, which analyzes the queries structure of different sources to further eliminate unwanted data. This allows to accelerate the processing phase. The generated data can be used for personalizing user-system interaction, information filtering and recommending appropriate sources for the needs of each user.

[1]  Jaideep Srivastava,et al.  Web usage mining: discovery and applications of usage patterns from Web data , 2000, SKDD.

[2]  Pier Luca Lanzi,et al.  Mining interesting knowledge from weblogs: a survey , 2005, Data Knowl. Eng..

[3]  Xin Li,et al.  Tag-based social interest discovery , 2008, WWW.

[4]  Tao Luo,et al.  Discovery and Evaluation of Aggregate Usage Profiles for Web Personalization , 2004, Data Mining and Knowledge Discovery.

[5]  Jigar D. Patel,et al.  Web Mining in E-Commerce : Pattern Discovery , Issues and Applications , 2011 .

[6]  R. Shanthi,et al.  An Efficient Web Mining Algorithm To Mine Web Log Information , 2022 .

[7]  J. Vellingiri,et al.  A Novel Approach for User Navigation Pattern Discovery and Analysis for Web Usage Mining , 2015, J. Comput. Sci..

[8]  Naeem Ahmed Mahoto,et al.  EXTRACTION OF WEB NAVIGATION PATTERNS BY MEANS OF SEQUENTIAL PATTERN MINING , 2016 .

[9]  Alexandros Nanopoulos,et al.  Social tagging in recommender systems: a survey of the state-of-the-art and possible extensions , 2010, Artificial Intelligence Review.

[10]  Evon M. O. Abu-Taieh,et al.  Comparative Study , 2020, Definitions.

[11]  Alberto Córdoba,et al.  An Algorithm for the Improvement of Tag-based Social Interest Discovery , 2010 .

[12]  Fabio Crestani,et al.  Towards query log based personalization using topic models , 2010, CIKM.

[13]  Yan Song,et al.  Tag-Based User Interest Discovery Though Keywords Extraction in Social Network , 2015, BigCom.

[14]  Jaideep Srivastava,et al.  Data Preparation for Mining World Wide Web Browsing Patterns , 1999, Knowledge and Information Systems.

[15]  V Sujatha,et al.  Improved user Navigation Pattern Prediction Technique from Web Log Data , 2012 .