Link and Graph Mining in the Big Data Era

Graphs are a convenient representation for large sets of data, being complex networks, social networks, publication networks, and so on. The growing volume of data modeled as complex networks, e.g. the World Wide Web, and social networks like Twitter, Facebook, has raised a new area of research focused in complex networks mining. In this new multidisciplinary area, it is possible to highlight some important tasks: extraction of statistical properties, community detection, link prediction, among several others. This new approach has been driven largely by the growing availability of computers and communication networks, which allow us to gather and analyze data on a scale far larger than previously possible. In this chapter we will give an overview of several graph mining approach to mine and handle large complex networks.

[1]  Jimmy J. Lin,et al.  WTF: the who to follow service at Twitter , 2013, WWW.

[2]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[3]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Charu C. Aggarwal,et al.  When will it happen?: relationship prediction in heterogeneous information networks , 2012, WSDM '12.

[5]  Mohammad Al Hasan,et al.  Output Space Sampling for Graph Patterns , 2009, Proc. VLDB Endow..

[6]  Christos Faloutsos,et al.  oddball: Spotting Anomalies in Weighted Graphs , 2010, PAKDD.

[7]  Biswajit Basu,et al.  Real-Time Traffic Flow Forecasting Using Spectral Analysis , 2012, IEEE Transactions on Intelligent Transportation Systems.

[8]  Jari Saramäki,et al.  Temporal Networks , 2011, Encyclopedia of Social Network Analysis and Mining.

[9]  Myra Spiliopoulou,et al.  Evolution in Social Networks: A Survey , 2011, Social Network Data Analytics.

[10]  P. Mucha,et al.  Communities in multislice voting networks. , 2010, Chaos.

[11]  Andrew McGregor,et al.  Graph stream algorithms: a survey , 2014, SGMD.

[12]  Hari Balakrishnan,et al.  Resilient overlay networks , 2001, SOSP.

[13]  Harry Eugene Stanley,et al.  Robustness of a Network of Networks , 2010, Physical review letters.

[14]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[15]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[16]  Katarzyna Musial,et al.  A Method for Group Extraction in Complex Social Networks , 2010, WSKS.

[17]  Ben Shneiderman,et al.  Graph Analytics-Lessons Learned and Challenges Ahead , 2011, IEEE Computer Graphics and Applications.

[18]  Mohammad Al Hasan,et al.  A Survey of Link Prediction in Social Networks , 2011, Social Network Data Analytics.

[19]  Haixun Wang,et al.  Trinity: a distributed graph engine on a memory cloud , 2013, SIGMOD '13.

[20]  Philip S. Yu,et al.  Inferring anchor links across multiple heterogeneous social networks , 2013, CIKM.

[21]  Marko A. Rodriguez,et al.  The Gremlin graph traversal machine and language (invited talk) , 2015, DBPL.

[22]  Saral Jain,et al.  Parallel Clustered Low-Rank Approximation of Graphs and Its Application to Link Prediction , 2012, LCPC.

[23]  Philip S. Yu,et al.  On Dynamic Link Inference in Heterogeneous Networks , 2012, SDM.

[24]  Xin Xu,et al.  Beyond random walk and metropolis-hastings samplers: why you should not backtrack for unbiased graph sampling , 2012, SIGMETRICS '12.

[25]  Christos Faloutsos,et al.  Weighted graphs and disconnected components: patterns and a generator , 2008, KDD.

[26]  Charalampos E. Tsourakakis Fast Counting of Triangles in Large Real Networks without Counting: Algorithms and Laws , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[27]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[28]  Jure Leskovec,et al.  Predicting positive and negative links in online social networks , 2010, WWW '10.

[29]  Lawrence B. Holder,et al.  Frequent subgraph mining on a single large graph using sampling techniques , 2010, MLG '10.

[30]  Christos Faloutsos,et al.  Epidemic thresholds in real networks , 2008, TSEC.

[31]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[32]  Ramana Rao Kompella,et al.  Network Sampling via Edge-based Node Selection with Graph Induction , 2011 .

[33]  Leo Katz,et al.  A new status index derived from sociometric analysis , 1953 .

[34]  Nitesh V. Chawla,et al.  Link Prediction and Recommendation across Heterogeneous Social Networks , 2012, 2012 IEEE 12th International Conference on Data Mining.

[35]  Christos Faloutsos,et al.  ANF: a fast and scalable tool for data mining in massive graphs , 2002, KDD.

[36]  Linyuan Lu,et al.  Role of weak ties in link prediction of complex networks , 2009, CIKM-CNIKM.

[37]  Charu C. Aggarwal,et al.  Co-author Relationship Prediction in Heterogeneous Bibliographic Networks , 2011, 2011 International Conference on Advances in Social Networks Analysis and Mining.

[38]  Jon Kleinberg,et al.  The link prediction problem for social networks , 2003, CIKM '03.

[39]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[40]  Jure Leskovec,et al.  Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters , 2008, Internet Math..

[41]  A. Vespignani,et al.  The architecture of complex weighted networks. , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[42]  Dacheng Tao,et al.  Efficient Latent Link Recommendation in Signed Networks , 2015, KDD.

[43]  Yuchen Zhao,et al.  Mining Large Graphs , 2013 .

[44]  Conrado J. Pérez Vicente,et al.  Diffusion dynamics on multiplex networks , 2012, Physical review letters.

[45]  Lada A. Adamic,et al.  Friends and neighbors on the Web , 2003, Soc. Networks.

[46]  David F. Gleich,et al.  Mining Large Graphs , 2016, Handbook of Big Data.

[47]  Yunming Ye,et al.  MultiRank: co-ranking for objects and relations in multi-relational data , 2011, KDD.

[48]  Philip S. Yu,et al.  Transferring heterogeneous links across location-based social networks , 2014, WSDM.

[49]  Dino Pedreschi,et al.  Human mobility, social ties, and link prediction , 2011, KDD.

[50]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[51]  Katarzyna Musial,et al.  Analysis of Neighbourhoods in Multi-layered Dynamic Social Networks , 2012, Int. J. Comput. Intell. Syst..

[52]  Christos Faloutsos,et al.  Graph evolution: Densification and shrinking diameters , 2006, TKDD.

[53]  Ginestra Bianconi,et al.  Weighted Multiplex Networks , 2013, PloS one.

[54]  Mark S. Granovetter The Strength of Weak Ties , 1973, American Journal of Sociology.

[55]  Danai Koutra,et al.  Graph based anomaly detection and description: a survey , 2014, Data Mining and Knowledge Discovery.

[56]  Jure Leskovec,et al.  Supervised random walks: predicting and recommending links in social networks , 2010, WSDM '11.

[57]  Ana Paula Appel,et al.  Prophet -- A Link-Predictor to Learn New Rules on NELL , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[58]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[59]  Paolo Avesani,et al.  Controversial Users Demand Local Trust Metrics: An Experimental Study on Epinions.com Community , 2005, AAAI.

[60]  Ginestra Bianconi,et al.  Multiplex PageRank , 2013, PloS one.

[61]  Ana Paula Appel,et al.  HADI: Mining Radii of Large Graphs , 2011, TKDD.

[62]  M Barthelemy,et al.  Transport on coupled spatial networks. , 2012, Physical review letters.

[63]  Avery Ching,et al.  One Trillion Edges: Graph Processing at Facebook-Scale , 2015, Proc. VLDB Endow..

[64]  Long Jin,et al.  Understanding Graph Sampling Algorithms for Social Network Analysis , 2011, 2011 31st International Conference on Distributed Computing Systems Workshops.

[65]  Christos Faloutsos,et al.  PEGASUS: A Peta-Scale Graph Mining System Implementation and Observations , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[66]  Ido Guy,et al.  Do you want to know?: recommending strangers in the enterprise , 2011, CSCW.

[67]  Christos Faloutsos,et al.  Quantifying Reciprocity in Large Weighted Communication Networks , 2012, PAKDD.

[68]  Shirish Tatikonda,et al.  From "Think Like a Vertex" to "Think Like a Graph" , 2013, Proc. VLDB Endow..

[69]  Christos Faloutsos,et al.  Big graph mining: algorithms and discoveries , 2013, SKDD.

[70]  Tom M. Mitchell,et al.  Random Walk Inference and Learning in A Large Scale Knowledge Base , 2011, EMNLP.

[71]  Estevam R. Hruschka,et al.  Toward an Architecture for Never-Ending Language Learning , 2010, AAAI.

[72]  R. Lambiotte,et al.  Multilevel Local Optimization of Modularity , 2013 .

[73]  Jure Leskovec,et al.  Microscopic evolution of social networks , 2008, KDD.

[74]  Raouf Boutaba,et al.  A survey of network virtualization , 2010, Comput. Networks.

[75]  Jimeng Sun,et al.  Big data analytics for healthcare , 2013, KDD.

[76]  William Yang Wang,et al.  Programming with personalized pagerank: a locally groundable first-order probabilistic logic , 2013, CIKM.

[77]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[78]  Yaneer Bar-Yam,et al.  Time-Dependent Complex Networks: Dynamic Centrality, Dynamic Motifs, and Cycles of Social Interactions , 2009 .

[79]  A. Barabasi,et al.  Weighted evolving networks. , 2001, Physical review letters.

[80]  Linyuan Lu,et al.  Link prediction based on local random walk , 2010, 1001.2467.

[81]  Katarzyna Musial,et al.  A degree centrality in multi-layered social network , 2011, 2011 International Conference on Computational Aspects of Social Networks (CASoN).

[82]  Sharon L. Milgram,et al.  The Small World Problem , 1967 .

[83]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[84]  U. Alon Network motifs: theory and experimental approaches , 2007, Nature Reviews Genetics.

[85]  Anna Monreale,et al.  Multidimensional networks: foundations of structural analysis , 2013, World Wide Web.

[86]  Mason A. Porter,et al.  Multilayer networks , 2013, J. Complex Networks.

[87]  G. Bianconi Statistical mechanics of multiplex networks: entropy and overlap. , 2013, Physical review. E, Statistical, nonlinear, and soft matter physics.

[88]  Tamara G. Kolda,et al.  Temporal Link Prediction Using Matrix and Tensor Factorizations , 2010, TKDD.

[89]  Haixun Wang,et al.  The Trinity Graph Engine , 2012 .

[90]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[91]  Christos Faloutsos,et al.  Sampling from large graphs , 2006, KDD '06.

[92]  Tao Zhou,et al.  Link prediction in weighted networks: The role of weak ties , 2010 .

[93]  Jukka-Pekka Onnela,et al.  Community Structure in Time-Dependent, Multiscale, and Multiplex Networks , 2009, Science.

[94]  Christian Bauckhage,et al.  The slashdot zoo: mining a social network with negative edges , 2009, WWW.

[95]  Linyuan Lu,et al.  Link Prediction in Complex Networks: A Survey , 2010, ArXiv.

[96]  Amit Kumar,et al.  Connectivity and inference problems for temporal networks , 2000, STOC '00.

[97]  Karrie Karahalios,et al.  People Search within an Online Social Network: Large Scale Analysis of Facebook Graph Search Query Logs , 2014, CIKM.

[98]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[99]  Joydeep Ghosh,et al.  Graph databases for large-scale healthcare systems: A framework for efficient data management and data services , 2014, 2014 IEEE 30th International Conference on Data Engineering Workshops.

[100]  Mark Newman,et al.  Networks: An Introduction , 2010 .

[101]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[102]  Hongyuan Zha,et al.  Co-ranking Authors and Documents in a Heterogeneous Network , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[103]  Lada A. Adamic,et al.  Power-Law Distribution of the World Wide Web , 2000, Science.

[104]  Joseph Gonzalez,et al.  PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[105]  Luis Mario Floría,et al.  Evolution of Cooperation in Multiplex Networks , 2012, Scientific Reports.

[106]  Eric Fleury,et al.  A unifying model for representing time-varying graphs , 2014, 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA).

[107]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[108]  Claudio Castellano,et al.  Community Structure in Graphs , 2007, Encyclopedia of Complexity and Systems Science.

[109]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[110]  Mathieu Bastian,et al.  Gephi: An Open Source Software for Exploring and Manipulating Networks , 2009, ICWSM.

[111]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[112]  A. Arenas,et al.  Mathematical Formulation of Multilayer Networks , 2013, 1307.4977.

[113]  V. Latora,et al.  Complex networks: Structure and dynamics , 2006 .

[114]  Christopher Ré,et al.  Elementary: Large-Scale Knowledge-Base Construction via Machine Learning and Statistical Inference , 2012, Int. J. Semantic Web Inf. Syst..

[115]  John E. Hopcroft,et al.  Using community information to improve the precision of link prediction methods , 2012, WWW.

[116]  Kun-Lung Wu,et al.  Counting and Sampling Triangles from a Graph Stream , 2013, Proc. VLDB Endow..

[117]  Murat Demirbas,et al.  Giraphx: Parallel Yet Serializable Large-Scale Graph Processing , 2013, Euro-Par.

[118]  Alessandro Vespignani,et al.  Characterization and modeling of weighted networks , 2005 .

[119]  Antonio Scala,et al.  Networks of Networks: The Last Frontier of Complexity , 2014 .

[120]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[121]  Petter Holme,et al.  Structure and time evolution of an Internet dating community , 2002, Soc. Networks.

[122]  Ana Paula Appel,et al.  Using Social Analytics for Studying Work-Networks: A Novel, Initial Approach , 2012, 2012 Brazilian Symposium on Collaborative Systems.

[123]  Charu C. Aggarwal,et al.  Evolutionary Network Analysis , 2014, ACM Comput. Surv..

[124]  Anand Rajaraman,et al.  Building, maintaining, and using knowledge bases: a report from the trenches , 2013, SIGMOD '13.

[125]  Wei Zhang,et al.  Knowledge vault: a web-scale approach to probabilistic knowledge fusion , 2014, KDD.

[126]  Massimiliano Zanin,et al.  Emergence of network features from multiplexity , 2012, Scientific Reports.

[127]  Michalis Faloutsos,et al.  On power-law relationships of the Internet topology , 1999, SIGCOMM '99.

[128]  Mason A. Porter,et al.  Robust Detection of Dynamic Community Structure in Networks , 2012, Chaos.