STRUCTURAL SIMILARITIES OF COMPLEX NETWORKS: A COMPUTATIONAL MODEL BY EXAMPLE OF WIKI GRAPHS

This article elaborates a framework for representing and classifying large complex networks by example of wiki graphs. By means of this framework we reliably measure the similarity of document, agent, and word networks by solely regarding their topology. In doing so, the article departs from classical approaches to complex network theory which focuses on topological characteristics in order to check their small world property. This does not only include characteristics that have been studied in complex network theory, but also some of those which were invented in social network analysis and hypertext theory. We show that network classifications come into reach which go beyond the hypertext structures traditionally analyzed in web mining. The reason is that we focus on networks as a whole as units to be classified—above the level of websites and their constitutive pages. As a consequence, we bridge classical approaches to text and web mining on the one hand and complex network theory on the other hand. Last but not least, this approach also provides a framework for quantifying the linguistic notion of intertextuality.

[1]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[2]  S. Wasserman,et al.  Social Network Analysis: Data , 1994 .

[3]  G. Miller,et al.  Contextual correlates of semantic similarity , 1991 .

[4]  Giorgio Gallo,et al.  Directed Hypergraphs and Applications , 1993, Discret. Appl. Math..

[5]  Matthias Dehmer,et al.  Strukturelle Analyse web-basierter Dokumente , 2005 .

[6]  H E Stanley,et al.  Classes of small-world networks. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Alexander Mehler Structure Formation in the Web , 2010 .

[8]  Klaus Brinker,et al.  Linguistische Textanalyse : eine Einführung in Grundbegriffe und Methoden , 2001 .

[9]  Regina Tyshkevich,et al.  Lectures on Graph Theory , 1994 .

[10]  F. Harary,et al.  Eccentricity and centrality in networks , 1995 .

[11]  Donia Scott,et al.  Document Structure , 2003, CL.

[12]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[13]  Juhan Tuldava,et al.  Methods in quantitative linguistics , 1995 .

[14]  Alexander Mehler,et al.  Text Linkage in the Wiki Medium - A Comparative Study , 2006 .

[15]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[16]  M. Newman,et al.  Origin of degree correlations in the Internet and other networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[17]  B. Rieger Semiotic Cognitive Information Processing: Learning to Understand Discourse. A Systemic Model of Meaning Constitution , 2003 .

[18]  Angelo Cangelosi,et al.  Simulating the Evolution of Language , 2002, Springer London.

[19]  Ph. Blanchard,et al.  The “Cameo Principle” and the Origin of Scale-Free Graphs in Social Networks , 2004 .

[20]  Stefan Bornholdt,et al.  Handbook of Graphs and Networks: From the Genome to the Internet , 2003 .

[21]  M. E. J. Newman,et al.  Power laws, Pareto distributions and Zipf's law , 2005 .

[22]  Ben Shneiderman,et al.  Structural analysis of hypertexts: identifying hierarchies and useful metrics , 1992, TOIS.

[23]  Matthias Dehmer,et al.  Data Mining-Konzepte und graphentheoretische Methoden zur Analyse hypertextueller Daten , 2005, LDV Forum.

[24]  Alexander Mehler,et al.  Towards a Uniform Representation of Treebanks: Providing Interoperability for Dependency Tree Data , 2008 .

[25]  P. Garvin,et al.  Prolegomena to a Theory of Language , 1953 .

[26]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[27]  Douglas Biber,et al.  Dimensions of Register Variation: A Cross-Linguistic Comparison , 1995 .

[28]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[29]  L. Steels Collaborative tagging as distributed cognition , 2006 .

[30]  Olga Pustylnikov Guessing Text Type by Structure , 2007 .

[31]  Soumen Chakrabarti,et al.  Mining the web - discovering knowledge from hypertext data , 2002 .

[32]  U. Brandes A faster algorithm for betweenness centrality , 2001 .

[33]  Duncan J. Watts,et al.  Six Degrees: The Science of a Connected Age , 2003 .

[34]  Marián Boguñá,et al.  Correlations in weighted networks. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[35]  Juhan Tuldava,et al.  Probleme und Methoden der quantitativ-systemischen Lexikologie , 1998 .

[36]  Hinrich Sch Automatic Word Sense Discrimination , 1998 .

[37]  Angelika Storrer Coherence in text and hypertext Preprint , 2002 .

[38]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994 .

[39]  Philip H. Winne,et al.  Exploring Individual Differences in Studying Strategies Using Graph Theoretic Statistics. , 1994 .

[40]  Luc Steels,et al.  Grounding symbols through evolutionary language games , 2002 .

[41]  Matthias Dehmer,et al.  Classification of Large Graphs by a Local Tree Decomposition , 2005, DMIN.

[42]  Alessandro Vespignani,et al.  Evolution and structure of the Internet , 2004 .

[43]  P. Blanchard,et al.  Epidemic spreading in a variety of scale free networks. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[44]  M. Newman Properties of highly clustered networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[45]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994, Structural analysis in the social sciences.

[46]  Reinhard Diestel,et al.  Graph Theory , 1997 .

[47]  Claude Berge,et al.  Lectures on graph theory , 1967 .

[48]  Gerda Ruge Wortbedeutung und Termassoziation: Methoden zur automatischen semantischen Klassifikation , 1995 .

[49]  Alexander Mehler,et al.  Structural Differentiae of Text Types - A Quantitative Model , 2007, GfKl.

[50]  Graeme Hirst,et al.  Evaluating WordNet-based Measures of Lexical Semantic Relatedness , 2006, CL.

[51]  Béla Bollobás,et al.  Mathematical results on scale‐free random graphs , 2005 .

[52]  Hinrich Schütze,et al.  Automatic Word Sense Discrimination , 1998, Comput. Linguistics.

[53]  Alexander Mehler,et al.  Representing and Maintaining Large Corpora , 2007 .

[54]  Alexander Mehler,et al.  ON THE IMPACT OF COMMUNITY STRUCTURE ON SELF-ORGANIZING LEXICAL NETWORKS , 2008 .

[55]  Ramon Ferrer i Cancho,et al.  The small world of human language , 2001, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[56]  Alexander Mehler Evolving Lexical Networks. A Simulation Model of Terminological Alignment , 2007 .

[57]  George P. Landow,et al.  Relationally encoded links and the rhetoric of hypertext , 1987, Hypertext.

[58]  Alexander Mehler,et al.  Hierarchical Orderings of Textual Units , 2002, COLING.

[59]  Robert Asher,et al.  The Encyclopedia of Language and Linguistics , 1995 .

[60]  Rainer Hammwöhner,et al.  Offene Hypertextsysteme. Das Konstanzer Hypertextsystem (KHS) im wissenschaftlichen und technischen Kontext , 1997 .

[61]  Alessandro Vespignani,et al.  Evolution and Structure of the Internet: A Statistical Physics Approach , 2004 .

[62]  Joshua B. Tenenbaum,et al.  The Large-Scale Structure of Semantic Networks: Statistical Analyses and a Model of Semantic Growth , 2001, Cogn. Sci..

[63]  R Pastor-Satorras,et al.  Dynamical and correlation properties of the internet. , 2001, Physical review letters.

[64]  M. Newman,et al.  Why social networks are different from other types of networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[65]  Alessandro Vespignani,et al.  Correlations in complex networks , 2007 .

[66]  W. E. Stephens,et al.  Range of Protons from N$sup 14$(n,p)C$sup 1$$sup 4$ , 1948 .

[67]  Alexander Mehler,et al.  A Formal Text Representation Model Based on Lexical Chaining. , 2007 .

[68]  Jeannett Martin,et al.  English Text: System and structure , 1992 .

[69]  Pierre Baldi,et al.  Modeling the Internet and the Web: Probabilistic Method and Algorithms , 2002 .

[70]  Andreas Hotho,et al.  A Brief Survey of Text Mining , 2005, LDV Forum.

[71]  Andy Schürr,et al.  GXL: A graph-based standard exchange format for reengineering , 2006, Sci. Comput. Program..

[72]  Albert-László Barabási,et al.  Hierarchical organization in complex networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[73]  Alexander Mehler Large Text Networks as an Object of Corpus Linguistic Studies , 2009 .

[74]  Alan M. Frieze,et al.  Random graphs , 2006, SODA '06.

[75]  M. Halliday Language as social semiotic: The social interpretation of language and meaning , 1976 .

[76]  A. Vespignani,et al.  The architecture of complex weighted networks. , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[77]  M. Barthelemy Betweenness centrality in large complex networks , 2003, cond-mat/0309436.

[78]  V. Zlatic,et al.  Wikipedias: collaborative web-based encyclopedias as complex networks. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[79]  Claude Berge,et al.  Hypergraphs - combinatorics of finite sets , 1989, North-Holland mathematical library.

[80]  Guido Caldarelli,et al.  Preferential attachment in the growth of social networks: the case of Wikipedia , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[81]  Alexander Mehler,et al.  Structural Classifiers of Text Types: Towards a Novel Model of Text Representation , 2007, LDV Forum.