Bottom-up clustering and top-down shattering of scale-free environments for information fusion

We consider information fusion an ‘active service’, which aims to adapt the presentation of the information to the user. Our work concerns the Internet, a scale-free small world graph, with the tasks being the evaluation of documents, novelty detection and collecting novel documents of ‘high value’ for the sake of the user. This procedure calls for user-computer interaction. To this end, four algorithms have been designed and are under testing in various Internet environments. The weblog algorithm utilizes competitive value-estimating agents and shatters the Internet domain. Bottom-up clustering develops tree-structured cluster hierarchies and alleviates navigation. Keyword extraction chooses the best keywords that match subsets of the clusters. Linkhighlighting makes use of user reinforcement, ranks Internet documents and closes the loop: It provides feedback to the weblog algorithm to improve value estimation and the shattering of the domain. Details about the algorithms are provided.

[1]  András Lörincz,et al.  Intelligent High-Performance Crawlers Used to Reveal Topic-Specific Structure of WWW , 2002, Int. J. Found. Comput. Sci..

[2]  Stephen Jones,et al.  Mutualism Promotes Diversity and Stability in a Simple Artificial Ecosystem , 2002, Artificial Life.

[3]  Joshua B. Tenenbaum,et al.  The Large-Scale Structure of Semantic Networks: Statistical Analyses and a Model of Semantic Growth , 2001, Cogn. Sci..

[4]  Michael L. Littman,et al.  Learning Analogies and Semantic Relations , 2003, ArXiv.

[5]  Carl Gutwin,et al.  KEA: practical automatic keyphrase extraction , 1999, DL '99.

[6]  Thorsten Joachims,et al.  A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization , 1997, ICML.

[7]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[8]  Marco Gori,et al.  Focused Crawling Using Context Graphs , 2000, VLDB.

[9]  V Latora,et al.  Efficient behavior of small-world networks. , 2001, Physical review letters.

[10]  Peter D. Turney Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL , 2001, ECML.

[11]  Daniel Boley,et al.  Principal Direction Divisive Partitioning , 1998, Data Mining and Knowledge Discovery.

[12]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[13]  András Lörincz,et al.  Fast adapting value estimation-based hybrid architecture for searching the world-wide web , 2002, Appl. Soft Comput..

[14]  Albert-László Barabási,et al.  Statistical mechanics of complex networks , 2001, ArXiv.

[15]  A. Barabasi,et al.  Scale-free characteristics of random networks: the topology of the world-wide web , 2000 .

[16]  Jeffrey P. Bigham,et al.  Combining Independent Modules to Solve Multiple-choice Synonym and Analogy Problems , 2003, ArXiv.