Efficient Node Proximity and Node Significance Computations in Graphs

Node proximity measures are commonly used for quantifying how nearby or otherwise related to two or more nodes in a graph are. Node significance measures are mainly used to find how much nodes are important in a graph. The measures of node proximity/significance have been highly effective in many predictions and applications. Despite their effectiveness, however, there are various shortcomings. One such shortcoming is a scalability problem due to their high computation costs on large size graphs and another problem on the measures is low accuracy when the significance of node and its degree in the graph are not related. The other problem is that their effectiveness is less when information for a graph is uncertain. For an uncertain graph, they require exponential computation costs to calculate ranking scores with considering all possible worlds. In this thesis, I first introduce Locality-sensitive, Re-use promoting, approximate Personalized PageRank (LR-PPR) which is an approximate personalized PageRank calculating node rankings for the locality information for seeds without calculating the entire graph and reusing the precomputed locality information for different locality combinations. For the identification of locality information, I present Impact Neighborhood Indexing (INI) to find impact neighborhoods with nodes’ fingerprints propagation on the network. For the accuracy challenge, I introduce Degree Decoupled PageRank (D2PR) technique to improve the effectiveness of PageRank based knowledge discovery, especially considering the significance of neighbors and degree of a given node. To tackle the uncertain challenge, I introduce Uncertain Personalized PageRank (UPPR) to approximately compute personalized PageRank values on uncertainties of edge existence and Interval Personalized PageRank with Integration (IPPR-I) and Interval Personalized PageRank with Mean (IPPR-M) to compute ranking scores for the case when uncertainty exists on edge weights as interval values. i

[1]  K. Avrachenkov,et al.  Quick Detection of Top-k Personalized PageRank Lists , 2011, WAW.

[2]  Louiqa Raschid,et al.  ApproxRank: Estimating Rank for a Subgraph , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[3]  C. Martel,et al.  Analysis and Models for Small-World Graphs ⁄ , 2005 .

[4]  Lada A. Adamic,et al.  Tracking information epidemics in blogspace , 2005, The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05).

[5]  Jure Leskovec,et al.  {SNAP Datasets}: {Stanford} Large Network Dataset Collection , 2014 .

[6]  Fang Wei-Kleiner TEDI: Efficient Shortest Path Query Answering on Graphs , 2011, Graph Data Management.

[7]  Sharad Goel,et al.  The Effect of Recommendations on Network Structure , 2016, WWW.

[8]  Ravi Kumar,et al.  Influence and correlation in social networks , 2008, KDD.

[9]  Yitong Wang,et al.  Use noisy link analysis to improve web search , 2009, HT '09.

[10]  Aristides Gionis,et al.  Fast Reliability Search in Uncertain Graphs , 2014, EDBT.

[11]  K. Selçuk Candan,et al.  Skynets: searching for minimum trees in graphs with incomparable edge weights , 2011, CIKM '11.

[12]  Kenneth Ward Church,et al.  Query suggestion using hitting time , 2008, CIKM '08.

[13]  Takuya Akiba,et al.  Shortest-path queries for complex networks: exploiting low tree-width outside the core , 2012, EDBT '12.

[14]  Robert A. van de Geijn,et al.  A flexible class of parallel matrix multiplication algorithms , 1998, Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing.

[15]  Yin Zhang,et al.  Scalable proximity estimation and link prediction in online social networks , 2009, IMC '09.

[16]  Edith Cohen,et al.  Reachability and distance queries via 2-hop labels , 2002, SODA '02.

[17]  Cécile Favre,et al.  Information diffusion in online social networks: a survey , 2013, SGMD.

[18]  Roberto Tempo,et al.  Fragile link structure in PageRank computation , 2009, Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference.

[19]  S. Borgatti,et al.  Network Measures of Social Capital , 2012 .

[20]  Steven Thomas Smith,et al.  Network Discovery for uncertain graphs , 2014, 17th International Conference on Information Fusion (FUSION).

[21]  Marco Rosa,et al.  HyperANF: approximating the neighbourhood function of very large graphs on a budget , 2010, WWW.

[22]  Kotagiri Ramamohanarao,et al.  Inverted files versus signature files for text indexing , 1998, TODS.

[23]  Tsvi Kuflik,et al.  Second workshop on information heterogeneity and fusion in recommender systems (HetRec2011) , 2011, RecSys '11.

[24]  Purnamrita Sarkar,et al.  Fast incremental proximity search in large graphs , 2008, ICML '08.

[25]  Tore Opsahl,et al.  Clustering in weighted networks , 2009, Soc. Networks.

[26]  Christos Faloutsos,et al.  Fast direction-aware proximity for graph mining , 2007, KDD '07.

[27]  Hong Chen,et al.  Probabilistic SimRank computation over uncertain graphs , 2015, Inf. Sci..

[28]  Zhenguo Li,et al.  PowerWalk: Scalable Personalized PageRank via Random Walks with Vertex-Centric Decomposition , 2016, CIKM.

[29]  Kevin S. McCurley,et al.  Ranking the web frontier , 2004, WWW '04.

[30]  Luca Becchetti,et al.  Using rank propagation and Probabilistic counting for Link-Based Spam Detection , 2006 .

[31]  K. Selçuk Candan,et al.  Personalized PageRank in Uncertain Graphs with Mutually Exclusive Edges , 2017, SIGIR.

[32]  M. H. van Emden,et al.  Interval arithmetic: From principles to implementation , 2001, JACM.

[33]  Peter Lofgren,et al.  Efficient Algorithms for Personalized PageRank , 2015, ArXiv.

[34]  Soumen Chakrabarti,et al.  Dynamic personalized pagerank in entity-relation graphs , 2007, WWW '07.

[35]  Jennifer Widom,et al.  Scaling personalized web search , 2003, WWW '03.

[36]  Lei Chen,et al.  On Uncertain Graphs Modeling and Queries , 2015, Proc. VLDB Endow..

[37]  Lee Sael,et al.  BEAR: Block Elimination Approach for Random Walk with Restart on Large Graphs , 2015, SIGMOD Conference.

[38]  Lei Chen,et al.  Efficiently Answering Probability Threshold-Based Shortest Path Queries over Uncertain Graphs , 2010, DASFAA.

[39]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[40]  David Liben-Nowell,et al.  The link-prediction problem for social networks , 2007 .

[41]  K. Selçuk Candan,et al.  Impact neighborhood indexing (INI) in diffusion graphs , 2012, CIKM '12.

[42]  Vipin Kumar,et al.  Highly Scalable Parallel Algorithms for Sparse Matrix Factorization , 1997, IEEE Trans. Parallel Distributed Syst..

[43]  Ashish Goel,et al.  Fast Incremental and Personalized PageRank , 2010, Proc. VLDB Endow..

[44]  Konstantin Avrachenkov,et al.  Monte Carlo Methods in PageRank Computation: When One Iteration is Sufficient , 2007, SIAM J. Numer. Anal..

[45]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[46]  Jie Tang,et al.  ArnetMiner: extraction and mining of academic social networks , 2008, KDD.

[47]  K. Selçuk Candan,et al.  PageRank Revisited: On the Relationship between Node Degrees and Node Significances in Different Applications , 2016, EDBT/ICDT Workshops.

[48]  Christos Faloutsos,et al.  Center-piece subgraphs: problem definition and fast solutions , 2006, KDD '06.

[49]  Éva Tardos,et al.  Maximizing the Spread of Influence through a Social Network , 2015, Theory Comput..

[50]  Paul Van Dooren,et al.  Maximizing PageRank via outlinks , 2007, ArXiv.

[51]  Cristian Molinaro,et al.  Using Generalized Annotated Programs to Solve Social Network Diffusion Optimization Problems , 2013, TOCL.

[52]  M. Stephens,et al.  The Distribution of a Sum of Binomial Random Variables , 1993 .

[53]  Konstantin Avrachenkov,et al.  The Effect of New Links on Google Pagerank , 2006 .

[54]  George Kollios,et al.  k-nearest neighbors in uncertain graphs , 2010, Proc. VLDB Endow..

[55]  Christos Faloutsos,et al.  ANF: a fast and scalable tool for data mining in massive graphs , 2002, KDD.

[56]  Christos Faloutsos,et al.  Fast Random Walk with Restart and Its Applications , 2006, Sixth International Conference on Data Mining (ICDM'06).

[57]  Martin Olsen Maximizing PageRank with New Backlinks , 2010, CIAC.

[58]  Jong Wook Kim,et al.  Efficient overlap and content reuse detection in blogs and online news articles , 2009, WWW '09.

[59]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[60]  Jianzhong Li,et al.  Mining Frequent Subgraph Patterns from Uncertain Graph Data , 2010, IEEE Transactions on Knowledge and Data Engineering.

[61]  Éva Tardos,et al.  Influential Nodes in a Diffusion Model for Social Networks , 2005, ICALP.

[62]  Gene H. Golub,et al.  Extrapolation methods for accelerating PageRank computations , 2003, WWW '03.

[63]  Charu C. Aggarwal,et al.  Negative Link Prediction in Social Media , 2014, WSDM.

[64]  Xuemin Lin,et al.  BMC: An Efficient Method to Evaluate Probabilistic Reachability Queries , 2011, DASFAA.

[65]  John D. Garofalakis,et al.  NCDawareRank: a novel ranking method that exploits the decomposable structure of the web , 2013, WSDM.

[66]  Mo Chen,et al.  Clustering via Random Walk Hitting Time on Directed Graphs , 2008, AAAI.

[67]  Balázs Csanád Csáji,et al.  PageRank optimization by edge selection , 2009, Discret. Appl. Math..

[68]  K. Selçuk Candan,et al.  How Does the Data Sampling Strategy Impact the Discovery of Information Diffusion in Social Media? , 2010, ICWSM.

[69]  Huan Liu,et al.  mTrust: discerning multi-faceted trust in a connected world , 2012, WSDM '12.

[70]  Mario A. Nascimento,et al.  Proceedings of the Thirtieth international conference on Very large data bases - Volume 30 , 2004 .

[71]  Lada A. Adamic,et al.  Social influence and the diffusion of user-created content , 2009, EC '09.

[72]  F. Maxwell Harper,et al.  The MovieLens Datasets: History and Context , 2016, TIIS.

[73]  Olivier Fercoq,et al.  PageRank optimization applied to spam detection , 2012, 2012 6th International Conference on Network Games, Control and Optimization (NetGCooP).

[74]  Laurent Viennot,et al.  Local Aspects of the Global Ranking of Web Pages , 2006, IICS.

[75]  Roberto Tempo,et al.  Computing the PageRank Variation for Fragile Web Data , 2009 .

[76]  Jure Leskovec,et al.  Information diffusion and external influence in networks , 2012, KDD.

[77]  George Casella,et al.  Erratum: Inverting a Sum of Matrices , 1990, SIAM Rev..

[78]  Matthew Brand,et al.  Fast Online SVD Revisions for Lightweight Recommender Systems , 2003, SDM.

[79]  David F. Gleich,et al.  Approximating Personalized PageRank with Minimal Use of Web Graph Data , 2006, Internet Math..

[80]  Taher H. Haveliwala Topic-sensitive PageRank , 2002, IEEE Trans. Knowl. Data Eng..

[81]  David F. Gleich,et al.  Algorithms and Models for the Web Graph , 2014, Lecture Notes in Computer Science.

[82]  Marshall C. Pease Matrix Inversion Using Parallel Processing , 1967, JACM.

[83]  Amy Nicole Langville,et al.  Google's PageRank and beyond - the science of search engine rankings , 2006 .

[84]  Yasuhiro Fujiwara,et al.  Efficient personalized pagerank with accuracy assurance , 2012, KDD.

[85]  Christos Faloutsos,et al.  Graph evolution: Densification and shrinking diameters , 2006, TKDD.

[86]  Jeffrey Xu Yu,et al.  Efficient and accurate query evaluation on uncertain graphs via recursive stratified sampling , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[87]  K. Selçuk Candan,et al.  Locality-sensitive and Re-use Promoting Personalized PageRank computations , 2015, Knowledge and Information Systems.

[88]  Jian Pei,et al.  Efficiently indexing shortest paths by exploiting symmetry in graphs , 2009, EDBT '09.

[89]  Ke Xu,et al.  DIGRank: using global degree to facilitate ranking in an incomplete graph , 2011, CIKM '11.

[90]  Charu C. Aggarwal,et al.  Managing and Mining Uncertain Data , 2009, Advances in Database Systems.

[91]  M. Newman,et al.  Finding community structure in networks using the eigenvectors of matrices. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[92]  James R. Lee,et al.  Improved approximation algorithms for minimum-weight vertex separators , 2005, STOC '05.

[93]  D. Watts,et al.  Influentials, Networks, and Public Opinion Formation , 2007 .

[94]  K. Selçuk Candan,et al.  Hive open research network platform , 2013, EDBT '13.

[95]  François Fouss,et al.  Random-Walk Computation of Similarities between Nodes of a Graph with Application to Collaborative Recommendation , 2007, IEEE Transactions on Knowledge and Data Engineering.

[96]  Haixun Wang,et al.  Distance-Constraint Reachability Computation in Uncertain Graphs , 2011, Proc. VLDB Endow..

[97]  Wei Chen,et al.  On the Hyperbolicity of Small-World and Treelike Random Graphs , 2013, Internet Math..

[98]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[99]  Takuya Akiba,et al.  Computing Personalized PageRank Quickly by Exploiting Graph Structures , 2014, Proc. VLDB Endow..

[100]  Jennifer Widom,et al.  SimRank: a measure of structural-context similarity , 2002, KDD.

[101]  Young-Sik Jeong,et al.  Apache Hama: An Emerging Bulk Synchronous Parallel Computing Framework for Big Data Applications , 2016, IEEE Access.

[102]  Yasuhiro Fujiwara,et al.  Fast and Exact Top-k Search for Random Walk with Restart , 2012, Proc. VLDB Endow..

[103]  K. Selçuk Candan,et al.  Reasoning for Web document associations and its applications in site map construction , 2002, Data Knowl. Eng..

[104]  Rossano Schifanella,et al.  The role of information diffusion in the evolution of social networks , 2013, KDD.

[105]  S. Borgatti,et al.  Betweenness centrality measures for directed graphs , 1994 .

[106]  Wei Chen,et al.  Scalable influence maximization for prevalent viral marketing in large-scale social networks , 2010, KDD.

[107]  Soumen Chakrabarti,et al.  Fast algorithms for topk personalized pagerank queries , 2008, WWW.

[108]  Christopher Ré,et al.  Managing Uncertainty in Social Networks , 2007, IEEE Data Eng. Bull..

[109]  K. Selçuk Candan,et al.  LR-PPR: locality-sensitive, re-use promoting, approximate personalized pagerank computation , 2013, CIKM.

[110]  K. Selçuk Candan,et al.  Using Random Walks for Mining Web Document Associations , 2000, PAKDD.