Managing Uncertainty in Social Networks

Social network analysis (SNA) has become a mature scientific field over the last 50 years and is now an area with massive commercial appeal and renewed research interest. In this paper, we argue that new methods for collecting social nework strucuture, and the shift in scale of these networks, introduces a greater degree of imprecision that requires rethinking on how SNA techniques can be applied. We discuss a new area in data management, probabilistic databases, whose main research goal is to provide tools to manage and manipulate imprecise or uncertain data. We outline the application building blocks necessary to build a large scale social networking application and the extent to which current research in probabilisitc databases addresses these challenges.

[1]  Thomas W. Valente Network models of the diffusion of innovations , 1996, Comput. Math. Organ. Theory.

[2]  Jayant Madhavan,et al.  Reference reconciliation in complex information spaces , 2005, SIGMOD '05.

[3]  B. Wellman The Development of Social Network Analysis: A Study in the Sociology of Science , 2008 .

[4]  Renée J. Miller,et al.  First-order query rewriting for inconsistent databases , 2005, J. Comput. Syst. Sci..

[5]  M. Kretzschmar,et al.  Concurrent partnerships and the spread of HIV , 1997, AIDS.

[6]  Deborah L. McGuinness,et al.  Explaining Subsumption in Description Logics , 1995, IJCAI.

[7]  Prithviraj Sen,et al.  Representing and Querying Correlated Tuples in Probabilistic Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[8]  Christopher Ré,et al.  Materialized Views in Probabilistic Databases for Information Exchange and Query Optimization , 2007, VLDB.

[9]  William E. Winkler,et al.  The State of Record Linkage and Current Research Problems , 1999 .

[10]  Christopher Ré,et al.  Query Evaluation on Probabilistic Databases , 2006, IEEE Data Eng. Bull..

[11]  Cynthia Dwork,et al.  Wherefore art thou r3579x?: anonymized social networks, hidden patterns, and structural steganography , 2007, WWW '07.

[12]  Lada A. Adamic,et al.  How to search a social network , 2005, Soc. Networks.

[13]  D. Berwick Disseminating innovations in health care. , 2003, JAMA.

[14]  Mohamed A. Soliman,et al.  Top-k Query Processing in Uncertain Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[15]  Ian Horrocks,et al.  Explaining ALC Subsumption , 2000, Description Logics.

[16]  Jennifer Widom,et al.  Trio: A System for Integrated Management of Data, Accuracy, and Lineage , 2004, CIDR.

[17]  Dan Suciu,et al.  Efficient query evaluation on probabilistic databases , 2004, The VLDB Journal.

[18]  P. Ivax,et al.  A THEORY FOR RECORD LINKAGE , 2004 .

[19]  Danah Boyd,et al.  Friendster and publicly articulated social networking , 2004, CHI EA '04.

[20]  Bart Selman,et al.  Referral Web: combining social networks and collaborative filtering , 1997, CACM.

[21]  Jennifer Widom,et al.  Working Models for Uncertain Data , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[22]  T. S. Jayram,et al.  OLAP over uncertain and imprecise data , 2007, The VLDB Journal.

[23]  R. May,et al.  How Viruses Spread Among Computers and People , 2001, Science.

[24]  Simon Wain-Hobson Virus Dynamics: Mathematical Principles of Immunology and Virology , 2001, Nature Medicine.

[25]  Ashwin Machanavajjhala,et al.  On the efficiency of checking perfect privacy , 2006, PODS '06.

[26]  M. Newman Spread of epidemic disease on networks. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[27]  Wei Hong,et al.  Model-Driven Data Acquisition in Sensor Networks , 2004, VLDB.

[28]  Nicholas Kushmerick,et al.  Regression testing for wrapper maintenance , 1999, AAAI/IAAI.

[29]  K. Back,et al.  SOCIOMETRIC PATTERNS IN HYSTERICAL CONTAGION. , 1965, Sociometry.

[30]  T. Valente,et al.  Accelerating the Diffusion of Innovations Using Opinion Leaders , 1999 .

[31]  Sunil Prabhakar,et al.  Evaluating probabilistic queries over imprecise data , 2003, SIGMOD '03.

[32]  Dan Suciu,et al.  Management of probabilistic data: foundations and challenges , 2007, PODS '07.

[33]  Barry Wellman,et al.  Visualizing Personal Networks: Working with Participant-aided Sociograms , 2007 .

[34]  Siddharth Srivastava,et al.  Anonymizing Social Networks , 2007 .

[35]  Barry Wellman,et al.  Challenges in Collecting Personal Network Data: The Nature of Personal Network Analysis , 2007 .

[36]  Lise Getoor,et al.  Link mining: a survey , 2005, SKDD.

[37]  Ivan P. Fellegi,et al.  A Theory for Record Linkage , 1969 .

[38]  Dan Suciu,et al.  A formal analysis of information disclosure in data exchange , 2004, SIGMOD '04.

[39]  Salvatore J. Stolfo,et al.  The merge/purge problem for large databases , 1995, SIGMOD '95.

[40]  Mark S. Granovetter Threshold Models of Collective Behavior , 1978, American Journal of Sociology.

[41]  Eytan Adar,et al.  Shock: A Privacy-Preserving Knowledge Network , 2003 .

[42]  B. Ryan The diffusion of hybrid seed corn in two Iowa communities , 1943 .

[43]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994, Structural analysis in the social sciences.

[44]  Christopher Ré,et al.  Efficient Top-k Query Evaluation on Probabilistic Data , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[45]  T. S. Jayram,et al.  Efficient aggregation algorithms for probabilistic data , 2007, SODA '07.

[46]  Padhraic Smyth,et al.  Algorithms for estimating relative importance in networks , 2003, KDD '03.

[47]  Matthai Philipose,et al.  Towards Activity Databases: Using Sensors and Statistical Models to Summarize People's Lives , 2006, IEEE Data Eng. Bull..

[48]  Caroline Haythornthwaite,et al.  Studying Online Social Networks , 2006, J. Comput. Mediat. Commun..

[49]  Dan Suciu,et al.  Asymptotic Conditional Probabilities for Conjunctive Queries , 2005, ICDT.

[50]  Hector Garcia-Molina,et al.  The Management of Probabilistic Data , 1992, IEEE Trans. Knowl. Data Eng..