LODAP: a log data preprocessor for mining web browsing patterns

In this paper, we present LODAP, a log data preprocessor which is able to extract user sessions starting from the requests stored in the log file of a Web site. LODAP is composed of several modules. A data cleaning module cleans the log file by removing useless records in order to retain only relevant requests encoding the user navigational behaviour. The data structuration module groups the remained requests in user sessions, by using a time-based method. Finally, the data filtering module considerably reduces the size of data concerning the extracted user sessions by deleting the least visited pages and the uninteresting sessions. In addition, a data summarization module creates reports which represent information summaries mined from the analyzed log file and containing the results provided by each module of LODAP. The implemented tool is characterized by a wizard-based interface which guides the analyst during the preprocessing of the log data through a sequence of "panels". Each panel is a graphical window which offers a basic functionality of the processor. Tests on the log files of a specific Web site show that the LODAP tool can effectively reduce the log dataset size and identify significant user sessions.