A Roadmap for Web Mining: From Web to Semantic Web

The purpose of Web mining is to develop methods and systems for discovering models of objects and processes on the World Wide Web and for web-based systems that show adaptive performance. Web Mining integrates three parent areas: Data Mining (we use this term here also for the closely related areas of Machine Learning and Knowledge Discovery), Internet technology and World Wide Web, and for the more recent Semantic Web. The World Wide Web has made an enormous amount of information electronically accessible. The use of email, news and markup languages like HTML allow users to publish and read documents at a world-wide scale and to communicate via chat connections, including information in the form of images and voice records. The HTTP protocol that enables access to documents over the network via Web browsers created an immense improvement in communication and access to information. For some years these possibilities were used mostly in the scientific world but recent years have seen an immense growth in popularity, supported by the wide availability of computers and broadband communication. The use of the internet for other tasks than finding information and direct communication is increasing, as can be seen from the interest in “e-activities” such as e-commerce, e-learning, e-government, e-science.

[1]  Janusz Kacprzyk,et al.  Intelligent Exploration of the Web , 2003, Studies in Fuzziness and Soft Computing.

[2]  Oren Etzioni,et al.  Web document clustering: a feasibility demonstration , 1998, SIGIR '98.

[3]  Gerd Stumme,et al.  FCA-MERGE: Bottom-Up Merging of Ontologies , 2001, IJCAI.

[4]  Andreas Hotho,et al.  Towards Semantic Web Mining , 2002, SEMWEB.

[5]  Myra Spiliopoulou,et al.  Web Usage Analysis and User Profiling , 2002, Lecture Notes in Computer Science.

[6]  Alexandre V. Evfimievski,et al.  Privacy preserving mining of association rules , 2002, Inf. Syst..

[7]  Angi Voß,et al.  The indiGo Project: Enhancement of Experience Management and Process Learning with Moderated Discourses , 2002, ICDM.

[8]  Haym Hirsh,et al.  Learning to Predict Rare Events in Event Sequences , 1998, KDD.

[9]  Gerd Stumme,et al.  Semantic resource management for the web: an e-learning application , 2004, WWW Alt. '04.

[10]  Matthias Baumgarten,et al.  User-Driven Navigation Pattern Discovery from Internet Data , 1999, WEBKDD.

[11]  Raymond J. Mooney,et al.  Bottom-Up Relational Learning of Pattern Matching Rules for Information Extraction , 2003, J. Mach. Learn. Res..

[12]  Sergio A. Alvarez,et al.  Efficient Adaptive-Support Association Rule Mining for Recommender Systems , 2004, Data Mining and Knowledge Discovery.

[13]  Steffen Staab,et al.  Comparing Conceptual, Divise and Agglomerative Clustering for Learning Taxonomies from Text , 2004, ECAI.

[14]  Steffen Staab,et al.  International Handbooks on Information Systems , 2013 .

[15]  Dunja Mladenic,et al.  Feature selection on hierarchy of web documents , 2003, Decis. Support Syst..

[16]  John F. Roddick,et al.  A Survey of Temporal Knowledge Discovery Paradigms and Methods , 2002, IEEE Trans. Knowl. Data Eng..

[17]  Sholom M. Weiss,et al.  Predictive data mining - a practical guide , 1997 .

[18]  Steffen Staab,et al.  Comparing conceptual, parti-tional and agglomerative clustering for learning taxonomies from text , 2004 .

[19]  David Bawden Data Mining and Decision Support: Integration and Collaboration , 2005, J. Documentation.

[20]  Xiaofeng Meng,et al.  Schema-guided wrapper maintenance for web-data extraction , 2003, WIDM '03.

[21]  Dayne Freitag,et al.  Boosted Wrapper Induction , 2000, AAAI/IAAI.

[22]  Saso Dzeroski,et al.  Inductive Logic Programming: Techniques and Applications , 1993 .

[23]  Gerard Salton,et al.  Research and Development in Information Retrieval , 1982, Lecture Notes in Computer Science.

[24]  Ivan Bratko,et al.  Machine Learning and Data Mining; Methods and Applications , 1998 .

[25]  Bart Selman,et al.  Referral Web: combining social networks and collaborative filtering , 1997, CACM.

[26]  Eduard Hovy,et al.  Combining and standardizing large- scale, practical ontologies for machine tranlation and other uses , 1998, LREC.

[27]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .

[28]  Mohammed J. Zaki,et al.  Mining features for sequence classification , 1999, KDD '99.

[29]  Pedro M. Domingos,et al.  Ontology Matching: A Machine Learning Approach , 2004, Handbook on Ontologies.

[30]  Osmar R. Zaïane,et al.  MDM/KDD: multimedia data mining for the second time , 2002, SKDD.

[31]  Mark A. Musen,et al.  PROMPT: Algorithm and Tool for Automated Ontology Merging and Alignment , 2000, AAAI/IAAI.

[32]  Matt Cutler E-metrics: tomorrow's business metrics today (invited talk) (abstract only) , 2000, KDD '00.

[33]  Stefan Wrobel,et al.  A sequential sampling algorithm for a general class of utility criteria , 2000, KDD '00.

[34]  Georgios Paliouras,et al.  An evaluation of Naive Bayesian anti-spam filtering , 2000, ArXiv.

[35]  Steffen Staab,et al.  Authoring and annotation of web pages in CREAM , 2002, WWW.

[36]  Maurice Mulvenna,et al.  Navigation Pattern Discovery from Internet Data , 1999 .

[37]  Mark Levene,et al.  Data Mining of User Navigation Patterns , 1999, WEBKDD.

[38]  Tao Luo,et al.  Discovery and Evaluation of Aggregate Usage Profiles for Web Personalization , 2004, Data Mining and Knowledge Discovery.

[39]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[40]  Steffen Staab,et al.  Ontology Learning for the Semantic Web , 2002, IEEE Intell. Syst..

[41]  Myra Spiliopoulou,et al.  A Framework for the Evaluation of Session Reconstruction Heuristics in Web-Usage Analysis , 2003, INFORMS J. Comput..

[42]  Stephan Bloehdorn,et al.  Boosting for Text Classification with Semantic Features , 2004, WebKDD.

[43]  Tom M. Mitchell,et al.  Improving Text Classification by Shrinkage in a Hierarchy of Classes , 1998, ICML.

[44]  Agnès Voisard,et al.  Extracting Information from Structured Exercises , 2003 .

[45]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[46]  Bettina Berendt,et al.  Cultural Determinants of Search Behaviour on Websites , 2004, IWIPS.

[47]  Deborah L. McGuinness,et al.  An Environment for Merging and Testing Large Ontologies , 2000, KR.

[48]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[49]  Oren Etzioni,et al.  Adaptive Web Sites: Automatically Synthesizing Web Pages , 1998, AAAI/IAAI.

[50]  Michael F. Schwartz,et al.  Discovering shared interests using graph analysis , 1993, CACM.

[51]  Bettina Berendt,et al.  Usage Mining for and on the Semantic Web , 2002 .

[52]  Steffen Staab,et al.  WordNet improves text document clustering , 2003, SIGIR 2003.

[53]  Daphne Koller,et al.  Hierarchically Classifying Documents Using Very Few Words , 1997, ICML.

[54]  Beng Chin Ooi,et al.  Making Web Servers Pushier , 1999, WEBKDD.

[55]  Constantine D. Spyropoulos,et al.  Machine Learning and Its Applications , 2001, Lecture Notes in Computer Science.

[56]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[57]  Vijay S. Iyengar,et al.  Transforming data to satisfy privacy constraints , 2002, KDD.

[58]  Steffen Staab,et al.  Explaining Text Clustering Results Using Semantic Structures , 2003, PKDD.

[59]  Myra Spiliopoulou,et al.  The Laborious Way From Data Mining to Web Log Mining , 1999 .

[60]  Hans Chalupsky,et al.  OntoMorph: A Translation System for Symbolic Knowledge , 2000, KR.

[61]  Nicholas Kushmerick,et al.  Adaptive Information Extraction: Core Technologies for Information Agents , 2003, AgentLink.

[62]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[63]  Ingrid Renz,et al.  Text Mining, Theoretical Aspects and Applications , 2002 .

[64]  Tim Berners-Lee,et al.  Weaving The Web: The Original Design And Ultimate Destiny of the World Wide Web , 1999 .

[65]  Hannu Toivonen,et al.  Automated Detection of Epidemics from the Usage Logs of a Physicians' Reference Database , 2003, PKDD.

[66]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[67]  Gabriel Yihune Evaluation eines medizinischen Informationssystems im World Wide Web: Nutzungsanalyse am Beispiel www.dermis.net , 2003 .

[68]  Hendrik Blockeel,et al.  Knowledge Discovery in Databases: PKDD 2003 , 2003, Lecture Notes in Computer Science.

[69]  Dunja Mladenic Web Browsing Using Machine Learning on Text Data , 2003, Intelligent Exploration of the Web.

[70]  Matthew Richardson,et al.  Mining the network value of customers , 2001, KDD '01.

[71]  Jean-Marc Adamo,et al.  Data Mining for Association Rules and Sequential Patterns , 2000, Springer New York.