Data Mining Using Links in Open Hypermedia

We present a survey of data mining using web links. The XLink standard provides new possibilities to mine the web but also poses complex new problems. In this paper, we analyze the new challenges posed by the Xlink standard and propose a model to mine XLink information on the web. Our model combines local and global information in a distributed web environment along with a dynamic approach for XLink paths in separated documents.

[1]  Jon M. Kleinberg,et al.  The Web as a Graph: Measurements, Models, and Methods , 1999, COCOON.

[2]  Rong Chen,et al.  Distributed Web mining using Bayesian networks from multiple data streams , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[3]  Peter J. Nürnberg,et al.  As we should have thought , 1997, HYPERTEXT '97.

[4]  José Manuel Gutiérrez,et al.  Expert Systems and Probabiistic Network Models , 1996 .

[5]  Monika Henzinger,et al.  Finding Related Pages in the World Wide Web , 1999, Comput. Networks.

[6]  Jon M. Kleinberg,et al.  Inferring Web communities from link topology , 1998, HYPERTEXT '98.

[7]  Kenneth M. Anderson,et al.  Using Structural Computing to Support Information Integration , 2001, OHS-7/SC-3/AH-3.

[8]  Ee-Peng Lim,et al.  DTD-Miner: a tool for mining DTD from XML documents , 2000, Proceedings Second International Workshop on Advanced Issues of E-Commerce and Web-Based Information Systems. WECWIS 2000.

[9]  Peter J. Nürnberg,et al.  A development environment for building component-based open hypermedia systems , 2000, HYPERTEXT '00.

[10]  Christian Borgelt,et al.  Graphical models - methods for data analysis and mining , 2002 .

[11]  Tom Heskes,et al.  Automatic Categorization of Web Pages and User Clustering with Mixtures of Hidden Markov Models , 2002, WEBKDD.

[12]  Mohammed J. Zaki,et al.  Web Usage Mining — Languages and Algorithms , 2003 .

[13]  Ben Taskar,et al.  Probabilistic Models of Text and Link Structure for Hypertext Classification , 2001 .

[14]  Piotr Indyk,et al.  Enhanced hypertext categorization using hyperlinks , 1998, SIGMOD '98.

[15]  Krishna Bharat,et al.  Who links to whom: mining linkage between Web sites , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[16]  Michael I. Jordan,et al.  Stable algorithms for link analysis , 2001, SIGIR '01.

[17]  Jon M. Kleinberg,et al.  Mining the Web's Link Structure , 1999, Computer.

[18]  Ramesh R. Sarukkai,et al.  Link prediction and path analysis using Markov chains , 2000, Comput. Networks.

[19]  Jane Yung-jen Hsu,et al.  Template-Based Information Mining from HTML Documents , 1997, AAAI/IAAI.

[20]  Enrique F. Castillo,et al.  Expert Systems and Probabilistic Network Models , 1996, Monographs in Computer Science.

[21]  Peter J. Nürnberg,et al.  Designing digital libraries for post-literate patrons , 1996, WebNet.

[22]  Keith L. Clark,et al.  Agents for Hypermedia Information Discovery , 1998, CIA.

[23]  Amy Pearl,et al.  Sun's Link Service: a protocol for open linking , 1989, Hypertext.

[24]  Jörg M. Haake Structural Computing in the Collaborative Work Domain? , 2000, OHS-6/SC-2.

[25]  Yiming Yang,et al.  Hypertext Categorization using Hyperlink Patterns and Meta Data , 2001, ICML.