Web Pattern Extraction and Storage

Web data provides information and knowledge to improve the web site content and structure. Indeed, it eventually contains knowledge which suggests changes that makes a web site more efficient and effective to attract and retain visitors. Making use of a Data Webhouse or a web analytics solution, it is possible to store statistical information concerning the behaviour of users in a website. Likewise, through applying web mining algorithms, interesting patterns can be discovered, interpreted and transformed into useful knowledge. On the other hand, web data include quantities of irrelevant but complex data preprocessing that must be applied in order to model and understand visitor browsing behaviour. Nevertheless, there are many ways to pre-process web data and model the browsing behaviour, hence different patterns can be obtained depending on which model is used. In this sense, a knowledge representation is necessary to store and manipulate web patterns. Generally, different patterns are discovered by using distinct web mining techniques on web data with dissimilar treatments. Consequently, patterns meta-data are relevant to manipulate the discovered knowledge. In this chapter, topics like feature selection, web mining techniques, models characterisation and pattern management will be covered in order to build a repository that stores patterns’ meta-data. Specifically, a Pattern Webhouse that facilitates knowledge management in the web environment.

[1]  Paul J. Werbos,et al.  The Roots of Backpropagation: From Ordered Derivatives to Neural Networks and Political Forecasting , 1994 .

[2]  Pablo E. Román,et al.  Web User Session Reconstruction Using Integer Programming , 2008, 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[3]  Richard Weber,et al.  A wrapper method for feature selection using Support Vector Machines , 2009, Inf. Sci..

[4]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[5]  Zdravko Markov,et al.  Data mining the web - uncovering patterns in web content, structure, and usage , 2007 .

[6]  Santosh S. Vempala,et al.  Latent semantic indexing: a probabilistic analysis , 1998, PODS '98.

[7]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[8]  Gordon I. McCalla,et al.  What is knowledge representation , 1987 .

[9]  Hui Xiong,et al.  Adapting the right measures for K-means clustering , 2009, KDD.

[10]  Kari Torkkola,et al.  Feature Extraction by Non-Parametric Mutual Information Maximization , 2003, J. Mach. Learn. Res..

[11]  Terumasa Aoki,et al.  Towards the Identification of Keywords in the Web Site Text Content: A Methodological Approach , 2005, Int. J. Web Inf. Syst..

[12]  Alexander J. Smola,et al.  Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[13]  Rick Pechter What's PMML and what's new in PMML 4.0? , 2009, SKDD.

[14]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[15]  Frank Rosenblatt,et al.  PRINCIPLES OF NEURODYNAMICS. PERCEPTRONS AND THE THEORY OF BRAIN MECHANISMS , 1963 .

[16]  Vasile Palade,et al.  Building a Knowledge Base for Implementing a Web-Based Computerized Recommendation System , 2007, Int. J. Artif. Intell. Tools.

[17]  Ralph Kimball,et al.  The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling , 1996 .

[18]  Meng Li,et al.  Stream Operators for Querying Data Streams , 2005, WAIM.

[19]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[20]  Robert Grossman,et al.  Open standards and cloud computing: KDD-2009 panel report , 2009, KDD.

[21]  Lars Kai Hansen,et al.  Webmining: learning from the world wide web , 2002 .

[22]  Huan Liu,et al.  A Distributed Hierarchical Clustering System for Web Mining , 2001, WAIM.

[23]  Ralf Walther,et al.  The Data Webhouse Toolkit , 2001, Künstliche Intell..

[24]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[25]  Giuseppe Psaila,et al.  An Extension to SQL for Mining Association Rules , 1998, Data Mining and Knowledge Discovery.

[26]  Terumasa Aoki,et al.  A New Similarity Measure to Understand Visitor Behavior in a Web Site , 2004, IEICE Trans. Inf. Syst..

[27]  Elisa Bertino,et al.  Modeling and language support for the management of pattern-bases , 2007, Data Knowl. Eng..

[28]  Naomie Salim,et al.  Independent Component Analysis and Rough Fuzzy based Approach to Web Usage Mining , 2006, Artificial Intelligence and Applications.

[29]  Tomasz Imielinski,et al.  MSQL: A Query Language for Database Mining , 1999, Data Mining and Knowledge Discovery.

[30]  Peter Szolovits,et al.  What Is a Knowledge Representation? , 1993, AI Mag..

[31]  Lior Wolf,et al.  Feature selection for unsupervised and supervised inference: the emergence of sparsity in a weighted-based approach , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[32]  Jaideep Srivastava,et al.  Data Preparation for Mining World Wide Web Browsing Patterns , 1999, Knowledge and Information Systems.

[33]  Robert L. Grossman What is analytic infrastructure and why should you care? , 2009, SKDD.

[34]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[35]  Maurice K. Wong,et al.  Algorithm AS136: A k-means clustering algorithm. , 1979 .

[36]  F. Fleuret Fast Binary Feature Selection with Conditional Mutual Information , 2004, J. Mach. Learn. Res..

[37]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[38]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[39]  John S. Bowie,et al.  Artificial Intelligence: A Personal, Commonsense Journey , 1985 .

[40]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[41]  Elisa Bertino,et al.  Modeling and language support for the management of pattern-bases , 2004, Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004..

[42]  Vasile Palade,et al.  A Knowledge Base for the maintenance of knowledge extracted from web data , 2007, Knowl. Based Syst..

[43]  Wei Wang,et al.  DMQL: A Data Mining Query Language for Relational Databases , 2007 .

[44]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[45]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[46]  Lakhmi C. Jain,et al.  Knowledge-Based Intelligent Information and Engineering Systems , 2004, Lecture Notes in Computer Science.

[47]  Thomas H. Davenport,et al.  Book review:Working knowledge: How organizations manage what they know. Thomas H. Davenport and Laurence Prusak. Harvard Business School Press, 1998. $29.95US. ISBN 0‐87584‐655‐6 , 1998 .

[48]  Krzysztof Trojanowski,et al.  Intelligent Information Processing and Web Mining , 2008 .

[49]  Bing Liu,et al.  Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data , 2006, Data-Centric Systems and Applications.

[50]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[51]  Rajkumar Roy,et al.  Advances in Soft Computing , 2018, Lecture Notes in Computer Science.

[52]  Robert L. Grossman,et al.  Data mining standards initiatives , 2002, CACM.

[53]  Yong Zhao,et al.  Towards combining web classification and web information extraction: a case study , 2009, KDD.

[54]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[55]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[56]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[57]  Barbara Catania,et al.  PSYCHO: A Prototype System for Pattern Management , 2005, VLDB.

[58]  A. Zanasi Text Mining and its Applications to Intelligence, CRM and Knowledge Management , 2007 .

[59]  Yong Wang,et al.  Classification of Web documents using a naive Bayes method , 2003, Proceedings. 15th IEEE International Conference on Tools with Artificial Intelligence.

[60]  W. H. Inmon,et al.  Building the data warehouse , 1992 .

[61]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[62]  Tzay Y. Young The Reliability of Linear Feature Extractors , 1971, IEEE Transactions on Computers.

[63]  V. Palade,et al.  Adaptive Web Sites - A Knowledge Extraction from Web Data Approach , 2008, Frontiers in Artificial Intelligence and Applications.

[64]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[65]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[66]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[67]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[68]  Huan Liu,et al.  Spectral feature selection for supervised and unsupervised learning , 2007, ICML '07.

[69]  Rui Li,et al.  Exploring social tagging graph for web object classification , 2009, KDD.

[70]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[71]  Juan D. Velásquez,et al.  Design and Implementation of a Methodology for Identifying Website Keyobjects , 2009, KES.