Mining Relaxed Graph Properties in Internet

Many real world datasets are represented in the form of graphs. The classical graph properties found in the data, like cliques or independent sets, can reveal new interesting information in the data. However, such properties can be either too rare or too trivial in the given context. By relaxing the criteria of the classical properties, we can find more and totally new patterns in the data. In this paper, we define relaxed graph properties and study their use in analyzing and processing graph-based data. Especially, we consider the problem of finding self-referring groups in WWW, and give a general algorithm for mining all such patterns from a collection of WWW pages. We suggest that such self-referring groups can reveal web communities or other clustering in WWW and also facilitate in compression of graph-formed data.

[1]  Michalis Faloutsos,et al.  A simple conceptual model for the Internet topology , 2001, GLOBECOM'01. IEEE Global Telecommunications Conference (Cat. No.01CH37270).

[2]  Lawrence B. Holder,et al.  Graph-Based Data Mining , 2000, IEEE Intell. Syst..

[3]  Ravi Kumar,et al.  Self-similarity in the web , 2001, TOIT.

[4]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[5]  Christos Faloutsos,et al.  ANF: a fast and scalable tool for data mining in massive graphs , 2002, KDD.

[6]  Ravi Kumar,et al.  On Semi-Automated Web Taxonomy Construction , 2001, WebDB.

[7]  Robert E. Tarjan,et al.  Depth-First Search and Linear Graph Algorithms , 1972, SIAM J. Comput..

[8]  Eli Upfal,et al.  The Web as a graph , 2000, PODS.

[9]  ISTVAN JONYER,et al.  Graph-Based Hierarchical Conceptual Clustering , 2000, Int. J. Artif. Intell. Tools.

[10]  Vladimir Batagelj,et al.  Generalized Cores , 2002, ArXiv.

[11]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[12]  George Karypis,et al.  An efficient algorithm for discovering frequent subgraphs , 2004, IEEE Transactions on Knowledge and Data Engineering.

[13]  Santosh S. Vempala,et al.  On clusterings-good, bad and spectral , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.