Mining on-line newspaper web access logs

With the explosive growth of data available on the Internet, discovery and analysis of useful information from web log data becomes a practical necessity. However, analysis of large web log files is a complex task not fully addressed by existing web access analyzers. Using commercial software, we applied data mining technology to analyze access log records collected on a web newspaper. We have identified several reading patterns and we discuss approaches for mining this data.

[1]  Mário J. Silva,et al.  ARIADNE - Digital Library Architecture , 1998, ECDL.

[2]  Jaideep Srivastava,et al.  Web usage mining: discovery and applications of usage patterns from Web data , 2000, SKDD.

[3]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[4]  Sourav S. Bhowmick,et al.  Research Issues in Web Data Mining , 1999, DaWaK.

[5]  David J. Spiegelhalter,et al.  Machine Learning, Neural and Statistical Classification , 2009 .

[6]  Jian Pei,et al.  Mining Access Patterns Efficiently from Web Logs , 2000, PAKDD.

[7]  Jiawei Han,et al.  Discovering Web access patterns and trends by applying OLAP and data mining technology on Web logs , 1998, Proceedings IEEE International Forum on Research and Technology Advances in Digital Libraries -ADL'98-.

[8]  Christos Nikolaou,et al.  Proceedings of the Second European Conference on Research and Advanced Technology for Digital Libraries , 1998 .

[9]  B. Manly Multivariate Statistical Methods : A Primer , 1986 .

[10]  Michael J. A. Berry,et al.  Data mining techniques - for marketing, sales, and customer support , 1997, Wiley computer publishing.

[11]  Alex G. Büchner Discovering Internet Marketing Intelligence through Web Log Mining , 2003 .

[12]  Surajit Chaudhuri,et al.  On the Efficient Gathering of Sufficient Statistics for Classification from Large SQL Databases , 1998, KDD.

[13]  Li Fan,et al.  Web caching and Zipf-like distributions: evidence and implications , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).