Correlated topologies in citation networks and the Web

Abstract.Information networks such as the scientific literature and the Web have been studied extensively by different communities focusing on alternative topological properties induced by citation links, textual content, and semantic relationships. This paper reviews work that brings such different perspectives together in order to build better search tools and to understand how the Web’s scale free topology emerges from author behavior. I describe three topologies induced by different classes of similarity measures, and outline empirical data that allows us to quantify and map their correlations. The data is also used to study a power law relationship between the content similarity between two documents and the probability that they are connected by citations or hyperlinks. Such finding has led to a remarkably powerful growth model for information networks, which simultaneously predicts the distribution of degree and the distribution of content similarity across pairs of documents -- Web pages connected by links and scientific articles connected by citations.

[1]  A. Vázquez Growing network with local rules: preferential attachment, clustering hierarchy, and degree correlations. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[2]  Filippo Menczer,et al.  Growing and navigating the small world Web by local content , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[3]  David M. Pennock,et al.  Winners don't take all: Characterizing the competition for links on the web , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Ricardo Baeza-Yates,et al.  Information Retrieval: Data Structures and Algorithms , 1992 .

[5]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[6]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[7]  D J PRICE,et al.  NETWORKS OF SCIENTIFIC PAPERS. , 1965, Science.

[8]  Sergey N. Dorogovtsev,et al.  Evolution of Networks: From Biological Nets to the Internet and WWW (Physics) , 2003 .

[9]  Albert-László Barabási,et al.  Statistical mechanics of complex networks , 2001, ArXiv.

[10]  Albert-László Barabási,et al.  Internet: Diameter of the World-Wide Web , 1999, Nature.

[11]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[12]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[13]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[14]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[15]  Filippo Menczer,et al.  Evolution of document networks , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[16]  S. N. Dorogovtsev,et al.  Structure of growing networks with preferential linking. , 2000, Physical review letters.

[17]  Jon M. Kleinberg,et al.  The Web as a Graph: Measurements, Models, and Methods , 1999, COCOON.

[18]  Lada A. Adamic,et al.  Power-Law Distribution of the World Wide Web , 2000, Science.

[19]  Lada A. Adamic,et al.  Internet: Growth dynamics of the World-Wide Web , 1999, Nature.

[20]  Alessandro Vespignani,et al.  Evolution and structure of the Internet , 2004 .