论文信息 - Automatic discovery of the sequential accesses from web log data files via a genetic algorithm

Automatic discovery of the sequential accesses from web log data files via a genetic algorithm

Abstract This paper is concerned with finding sequential accesses from web log files, using ‘Genetic Algorithm’ (GA). Web log files are independent from servers, and they are ASCII format. Each transaction, whether completed or not, is recorded in the web log files and these files are unstructured for knowledge discovery in database techniques. Data which is stored in web logs have become important for discovering of user behaviors since the using of internet increased rapidly. Analyzing of these log files is one of the important research area of web mining. Especially, with the advent of CRM (Customer Resource Management) issues in business circle, most of the modern firms operating web sites for several purposes are now adopting web-mining as a strategic way of capturing knowledge about potential needs of target customers, future trends in the market and other management factors. Our work (ALMG—Automatic Log Mining via Genetic) has mined web log files via genetic algorithm. When we search the studies about web mining in literature, it can be seen that, GA is generally used in web content and web structure mining. On the other hand, ALMG is a study about web mining usage. The difference between ALMG and other similar works at literature is this point. As for in another work that we are encountering, GA is used for processing the data between HTML tags which are placed at client PC. But ALMG extracts information from data which is placed at server. It is thought to use log files is an advantage for our purpose. Because, we find the character of requests which is made to the server than detect a single person's behavior. We developed an application with this purpose. Firstly, the application is analyzed web log files, than found sequential accessed page groups automatically.

Ahmet Arslan | Ayse Merve Sakiroglu | Emine Tug | A. Arslan | Emine Tug

[1] Byoung-Tak Zhang,et al. Genetic Mining of HTML Structures for Effective Web-Document Retrieval , 2003, Applied Intelligence.

[2] Ajith Abraham,et al. Web usage mining using artificial ant colony clustering and linear genetic programming , 2003, The 2003 Congress on Evolutionary Computation, 2003. CEC '03..

[3] Oren Etzioni,et al. The World-Wide Web: quagmire or gold mine? , 1996, CACM.

[4] Kun Chang Lee,et al. Fuzzy cognitive map approach to web-mining inference amplification , 2002, Expert Syst. Appl..

[5] Jaideep Srivastava,et al. Web usage mining: discovery and applications of usage patterns from Web data , 2000, SKDD.

[6] Nicolas Monmarché,et al. GeniMiner: Web Mining with a Genetic-Based Algorithm , 2002, ICWI.

[7] J. Leon Zhao,et al. Automatic discovery of similarity relationships through Web mining , 2003, Decis. Support Syst..

[8] Darrin J. Marshall. Data Mining using Genetic Algorithms , 1999 .

[9] Padhraic Smyth,et al. From Data Mining to Knowledge Discovery: An Overview , 1996, Advances in Knowledge Discovery and Data Mining.

[10] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.

[11] Olfa Nasraoui,et al. From Static to Dynamic Web Usage Mining : Towards Scalable Profiling and Personalization with Evolutionary Computation , 2003 .

[12] JoshiAnupam,et al. On Using a Warehouse to Analyze Web Logs , 2003 .