An algorithmic approach to social networks

Social networks consist of a set of individuals and some form of social relationship that ties the individuals together. In this thesis, we use algorithmic techniques to study three aspects of social networks: (1) we analyze the "small-world" phenomenon by examining the geographic patterns of friendships in a large-scale social network, showing how this linkage pattern can itself explain the small-world results; (2) using existing patterns of friendship in a social network and a variety of graph-theoretic techniques, we show how to predict new relationships that will form in the network in the near future; and (3) we show how to infer social connections over which information flows in a network, by examining the times at which individuals in the network exhibit certain pieces of information, or interest in certain topics. Our approach is simultaneously theoretical and data-driven, and our results are based upon real experiments on real social-network data in addition to theoretical investigations of mathematical models of social networks. (Copies available exclusively from MIT Libraries, Rm. 14-0551, Cambridge, MA 02139-4307. Ph. 617-253-5668; Fax 617-253-1690.)

[1]  R. Lance Shotland,et al.  University communication networks: The small world method , 1976 .

[2]  Alexander Grey,et al.  The Mathematical Theory of Infectious Diseases and Its Applications , 1977 .

[3]  Leonard E. Miller,et al.  Distribution of Link Distances in a Wireless Network , 2001, Journal of research of the National Institute of Standards and Technology.

[4]  Lada A. Adamic,et al.  How to search a social network , 2005, Soc. Networks.

[5]  M. Newman,et al.  Scaling and percolation in the small-world network model. , 1999, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[6]  Ravi Kumar,et al.  Trawling the Web for Emerging Cyber-Communities , 1999, Comput. Networks.

[7]  P. Erdos,et al.  On the evolution of random graphs , 1984 .

[8]  Beom Jun Kim,et al.  Path finding strategies in scale-free networks. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[9]  Tomasz Luczak,et al.  Component Behavior Near the Critical Point of the Random Graph Process , 1990, Random Struct. Algorithms.

[10]  Christos Faloutsos,et al.  ANF: a fast and scalable tool for data mining in massive graphs , 2002, KDD.

[11]  Chris Arney Sync: The Emerging Science of Spontaneous Order , 2007 .

[12]  Gesine Reinert,et al.  Small worlds , 2001, Random Struct. Algorithms.

[13]  Mark S. Granovetter T H E S T R E N G T H O F WEAK TIES: A NETWORK THEORY REVISITED , 1983 .

[14]  D. Goldberg,et al.  Assessing experimentally derived interactions in a small world , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Ben Hammersley Content Syndication with RSS , 2003 .

[16]  Albert-László Barabási,et al.  Error and attack tolerance of complex networks , 2000, Nature.

[17]  Jon M. Kleinberg,et al.  The Web as a Graph: Measurements, Models, and Methods , 1999, COCOON.

[18]  J. Guiot A modification of Milgram's small world method , 1976 .

[19]  Pierre Fraigniaud,et al.  Eclecticism shrinks even small worlds , 2004, PODC.

[20]  Moni Naor,et al.  Know thy neighbor's neighbor: the power of lookahead in randomized P2P networks , 2004, STOC '04.

[21]  Ramanathan V. Guha,et al.  TAP: a Semantic Web platform , 2003, Comput. Networks.

[22]  Stefan Bornholdt,et al.  Emergence of a small world from local interactions: modeling acquaintance networks. , 2002, Physical review letters.

[23]  Albert-László Barabási,et al.  Evolution of Networks: From Biological Nets to the Internet and WWW , 2004 .

[24]  Alessandro Vespignani,et al.  Epidemic spreading in scale-free networks. , 2000, Physical review letters.

[25]  P. Erdos,et al.  On the strength of connectedness of a random graph , 1964 .

[26]  Albert,et al.  Topology of evolving networks: local events and universality , 2000, Physical review letters.

[27]  Claudio Castellano,et al.  Defining and identifying communities in networks. , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[28]  Sanjeev Goyal,et al.  A strategic analysis of network reliability , 2000 .

[29]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[30]  James Allan,et al.  Topic Detection and Tracking , 2002, The Information Retrieval Series.

[31]  Ravi Kumar,et al.  Structure and evolution of blogspace , 2004, CACM.

[32]  Bruce A. Reed,et al.  A Critical Point for Random Graphs with a Given Degree Sequence , 1995, Random Struct. Algorithms.

[33]  Jennifer Widom,et al.  SimRank: a measure of structural-context similarity , 2002, KDD.

[34]  Prosenjit Bose,et al.  Online Routing in Triangulations , 1999, SIAM J. Comput..

[35]  S. Strogatz Exploring complex networks , 2001, Nature.

[36]  Arthur L. Liestman,et al.  A survey of gossiping and broadcasting in communication networks , 1988, Networks.

[37]  H E Stanley,et al.  Classes of small-world networks. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[38]  Mark Buchanan,et al.  Nexus: Small Worlds and the Groundbreaking Science of Networks , 2002 .

[39]  S. Milgram,et al.  Acquaintance Networks Between Racial Groups: Application of the Small World Method. , 1970 .

[40]  Tomasz Łuczak Component behavior near the critical point of the random graph process , 1990 .

[41]  M. Newman,et al.  Mixing patterns in networks. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[42]  Mark Newman,et al.  Models of the Small World , 2000 .

[43]  M E J Newman,et al.  Identity and Search in Social Networks , 2002, Science.

[44]  Charles U. Martel,et al.  Analyzing Kleinberg's (and other) small-world Models , 2004, PODC '04.

[45]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[46]  Kenneth J. Arrow,et al.  Information Dynamics in the Networked World , 2003, Inf. Syst. Frontiers.

[47]  Jerrold W. Grossman,et al.  Famous trails to Paul Erdős , 1999 .

[48]  James Allan,et al.  Topic detection and tracking: event-based information organization , 2002 .

[49]  Cohen,et al.  Resilience of the internet to random breakdowns , 2000, Physical review letters.

[50]  H. T. Kung,et al.  Geographic routing for wireless networks , 2000 .

[51]  Lada A. Adamic,et al.  Information flow in social groups , 2003, cond-mat/0305305.

[52]  M E J Newman Assortative mixing in networks. , 2002, Physical review letters.

[53]  Prosenjit Bose,et al.  Online Routing in Convex Subdivisions , 2000, ISAAC.

[54]  Víctor M Eguíluz,et al.  Epidemic threshold in structured scale-free networks. , 2002, Physical review letters.

[55]  Michael T. Gastner,et al.  The spatial structure of networks , 2006 .

[56]  James Brody Sync: The Emerging Science of Spontaneous Order by Steven Strogatz, NY: Hyperion, 10 chapters, 338 pp , 2003 .

[57]  Gobinda G. Chowdhury,et al.  A bibliometric analysis of collaboration in the field of Information Retrieval , 1998 .

[58]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[59]  N. Ling The Mathematical Theory of Infectious Diseases and its applications , 1978 .

[60]  Mark Newman,et al.  The structure and function of networks , 2002 .

[61]  G. Breeuwsma Geruchten als besmettelijke ziekte. Het succesverhaal van de Hush Puppies. Bespreking van Malcolm Gladwell, The tipping point. How little things can make a big difference. London: Little, Brown and Company, 2000 , 2000 .

[62]  Sharon L. Milgram,et al.  The Small World Problem , 1967 .

[63]  Carson C. Chow,et al.  Small Worlds , 2000 .

[64]  Olle Persson,et al.  Studying research collaboration using co-authorships , 1996, Scientometrics.

[65]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[66]  Z. Neda,et al.  Measuring preferential attachment in evolving networks , 2001, cond-mat/0104131.

[67]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[68]  P. Killworth,et al.  The reversal small-world experiment , 1978 .

[69]  Mark S. Granovetter Threshold Models of Collective Behavior , 1978, American Journal of Sociology.

[70]  M. Newman,et al.  Simple model of epidemics with pathogen mutation. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[71]  S. Bornholdt,et al.  Scale-free topology of e-mail networks. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[72]  M. Mitzenmacher A brief history of lognormal and power law distributions , 2001 .

[73]  Chuck Lam,et al.  SNACK: incorporating social network information in automated collaborative filtering , 2004, EC '04.

[74]  Stephanie Forrest,et al.  Email networks and the spread of computer viruses. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[75]  John Scott Social Network Analysis , 1988 .

[76]  Albert-László Barabási,et al.  Statistical mechanics of complex networks , 2001, ArXiv.

[77]  Brad Karp,et al.  GPSR: greedy perimeter stateless routing for wireless networks , 2000, MobiCom '00.

[78]  M. Newman,et al.  The structure of scientific collaboration networks. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[79]  Frances Cairncross The death of distance : how the communications revolution will change our lives , 1997 .

[80]  Massimo Marchiori,et al.  Error and attacktolerance of complex network s , 2004 .

[81]  Sepandar D. Kamvar,et al.  An Analytical Comparison of Approaches to Personalizing PageRank , 2003 .

[82]  D S Callaway,et al.  Network robustness and fragility: percolation on random graphs. , 2000, Physical review letters.

[83]  Aleksandrs Slivkins Distance estimation and object location via rings of neighbors , 2006, Distributed Computing.

[84]  A. Rbnyi ON THE EVOLUTION OF RANDOM GRAPHS , 2001 .

[85]  D. Watts,et al.  An Experimental Study of Search in Global Social Networks , 2003, Science.

[86]  Hawoong Jeong,et al.  Modeling the Internet's large-scale topology , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[87]  M. Newman,et al.  Renormalization Group Analysis of the Small-World Network Model , 1999, cond-mat/9903357.

[88]  Sanjeev Goyal,et al.  A strategic analysis of network reliability , 1999 .

[89]  Yiming Yang,et al.  Introducing the Enron Corpus , 2004, CEAS.

[90]  Hans Haller,et al.  Nash Networks with Heterogeneous Agents , 2000 .

[91]  Erik D. Demaine,et al.  Proximate point searching , 2004, CCCG.

[92]  M. Newman,et al.  Random graphs with arbitrary degree distributions and their applications. , 2000, Physical review. E, Statistical, nonlinear, and soft matter physics.

[93]  Jon M. Kleinberg,et al.  Small-World Phenomena and the Dynamics of Information , 2001, NIPS.

[94]  M. Newman Clustering and preferential attachment in growing networks. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[95]  Andrei Z. Broder,et al.  Graph structure in the Web , 2000, Comput. Networks.

[96]  Jerrold W. Grossman,et al.  The evolution of the mathematical research collaboration graph , 2002 .

[97]  Lada A. Adamic,et al.  Friends and neighbors on the Web , 2003, Soc. Networks.

[98]  Andrew Tomkins,et al.  How to build a WebFountain: An architecture for very large-scale text analytics , 2004, IBM Syst. J..

[99]  Ravi Kumar,et al.  On the Bursty Evolution of Blogspace , 2003, WWW '03.

[100]  Nan Lin,et al.  The Urban Communication Network and Social Stratification: A “Small World” Experiment , 1977 .

[101]  Paul Ginsparg,et al.  First steps towards electronic research communication , 1994 .

[102]  Lali Barrière,et al.  Efficient Routing in Networks with Long Range Contacts , 2001, DISC.

[103]  M. Newman Spread of epidemic disease on networks. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[104]  Fan Chung Graham,et al.  A Random Graph Model for Power Law Graphs , 2001, Exp. Math..

[105]  V. Latora,et al.  Efficiency of scale-free networks: error and attack tolerance , 2002, cond-mat/0205601.

[106]  Béla Bollobás,et al.  Robustness and Vulnerability of Scale-Free Random Graphs , 2004, Internet Math..

[107]  Jennifer Widom,et al.  Scaling personalized web search , 2003, WWW '03.

[108]  Lyle H. Ungar,et al.  Statistical Relational Learning for Link Prediction , 2003 .

[109]  Lada A. Adamic,et al.  Local Search in Unstructured Networks , 2002, ArXiv.

[110]  P. Bearman,et al.  Chains of Affection: The Structure of Adolescent Romantic and Sexual Networks1 , 2004, American Journal of Sociology.

[111]  Ben Taskar,et al.  Link Prediction in Relational Data , 2003, NIPS.

[112]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[113]  Rajeev Motwani,et al.  Randomized Algorithms , 1995, SIGA.

[114]  John Guare,et al.  Six Degrees of Separation: A Play , 1990 .

[115]  Jiawei Han,et al.  Text classification from positive and unlabeled documents , 2003, CIKM '03.

[116]  Béla Bollobás,et al.  The degree sequence of a scale‐free random graph process , 2001, Random Struct. Algorithms.

[117]  M. Newman,et al.  Epidemics and percolation in small-world networks. , 1999, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[118]  Béla Bollobás,et al.  Random Graphs , 1985 .

[119]  Rajendra Kulkarni,et al.  Spatial Small Worlds: New Geographic Patterns for an Information Economy , 2003 .

[120]  Taher H. Haveliwala Topic-Sensitive PageRank: A Context-Sensitive Ranking Algorithm for Web Search , 2003, IEEE Trans. Knowl. Data Eng..

[121]  Young-Jin Kim,et al.  Geographic routing made practical , 2005, NSDI.

[122]  L. Amaral,et al.  The web of human sexual contacts , 2001, Nature.

[123]  A. Barab,et al.  Evolution of the social network of scienti $ c collaborations , 2002 .

[124]  Jon M. Kleinberg,et al.  The small-world phenomenon: an algorithmic perspective , 2000, STOC '00.

[125]  Jacob Goldenberg,et al.  Talk of the Network: A Complex Systems Look at the Underlying Process of Word-of-Mouth , 2001 .

[126]  Bruce A. Reed,et al.  The Size of the Giant Component of a Random Graph with a Given Degree Sequence , 1998, Combinatorics, Probability and Computing.

[127]  Alfred J. Lotka,et al.  The frequency distribution of scientific productivity , 1926 .

[128]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[129]  Prasad Tetali,et al.  Design of on-line algorithms using hitting times , 1994, SODA '94.

[130]  Charles U. Martel,et al.  Analyzing and characterizing small-world graphs , 2005, SODA '05.

[131]  Prabhakar Raghavan,et al.  Social Networks: From the Web to the Enterprise , 2002, IEEE Internet Comput..

[132]  Christos Faloutsos,et al.  Fast discovery of connection subgraphs , 2004, KDD.

[133]  Albert-László Barabási,et al.  Linked - how everything is connected to everything else and what it means for business, science, and everyday life , 2003 .

[134]  Eli Upfal,et al.  Stochastic models for the Web graph , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[135]  Béla Bollobás,et al.  The Diameter of a Cycle Plus a Random Matching , 1988, SIAM J. Discret. Math..

[136]  M Girvan,et al.  Structure of growing social networks. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[137]  M. Kochen,et al.  Contacts and influence , 1978 .

[138]  Béla Bollobás,et al.  The Diameter of a Scale-Free Random Graph , 2004, Comb..

[139]  Volker Steinbiss,et al.  Cooccurrence smoothing for stochastic language modeling , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[140]  J. Moody Race, School Integration, and Friendship Segregation in America1 , 2001, American Journal of Sociology.

[141]  Duncan J. Watts,et al.  Six Degrees: The Science of a Connected Age , 2003 .

[142]  P. ERDbS ON THE STRENGTH OF CONNECTEDNESS OF A RANDOM GRAPH , 2001 .

[143]  Mark S. Granovetter The Strength of Weak Ties , 1973, American Journal of Sociology.

[144]  Lada A. Adamic,et al.  Search in Power-Law Networks , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[145]  Gene H. Golub,et al.  Exploiting the Block Structure of the Web for Computing , 2003 .

[146]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994, Structural analysis in the social sciences.

[147]  Matthew Richardson,et al.  Mining knowledge-sharing sites for viral marketing , 2002, KDD.

[148]  S. Redner,et al.  Organization of growing random networks. , 2000, Physical review. E, Statistical, nonlinear, and soft matter physics.

[149]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[150]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[151]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[152]  Matthew Richardson,et al.  Mining the network value of customers , 2001, KDD '01.

[153]  W. Zachary,et al.  An Information Flow Model for Conflict and Fission in Small Groups , 1977, Journal of Anthropological Research.

[154]  Noga Alon,et al.  A Graph-Theoretic Game and Its Application to the k-Server Problem , 1995, SIAM J. Comput..

[155]  Valdis E. Krebs,et al.  Mapping Networks of Terrorist Cells , 2001 .

[156]  Panos M. Pardalos,et al.  On maximum clique problems in very large graphs , 1999, External Memory Algorithms.

[157]  S. N. Dorogovtsev,et al.  Evolution of networks , 2001, cond-mat/0106144.

[158]  M. Newman,et al.  Why social networks are different from other types of networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[159]  D. Meadows-Klue The Tipping Point: How Little Things Can Make a Big Difference , 2004 .

[160]  John Iacono,et al.  Proximate planar point location , 2003, SCG '03.

[161]  Eytan Adar,et al.  Implicit Structure and the Dynamics of Blogspace , 2004 .

[162]  Michael Mitzenmacher,et al.  A Brief History of Generative Models for Power Law and Lognormal Distributions , 2004, Internet Math..

[163]  J. Kleinfeld COULD IT BE A BIG WORLD AFTER ALL? THE "SIX DEGREES OF SEPARATION" MYTH , 2002 .

[164]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[165]  B. Bollobás The evolution of random graphs , 1984 .

[166]  Leo Katz,et al.  A new status index derived from sociometric analysis , 1953 .

[167]  M. Newman 1 Who is the best connected scientist ? A study of scientific coauthorship networks , 2004 .

[168]  Noga Alon,et al.  The Probabilistic Method , 2015, Fundamentals of Ramsey Theory.

[169]  M. Newman,et al.  Scientific collaboration networks. II. Shortest paths, weighted networks, and centrality. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[170]  Lillian Lee,et al.  Measures of Distributional Similarity , 1999, ACL.