Finding Representative Nodes in Probabilistic Graphs

We introduce the problem of identifying representative nodes in probabilistic graphs, motivated by the need to produce different simple views to large BisoNets. We define a probabilistic similarity measure for nodes, and then apply clustering methods to find groups of nodes. Finally, a representative is output from each cluster. We report on experiments with real biomedical data, using both the k-medoids and hierarchical clustering methods in the clustering step. The results suggest that the clustering based approaches are capable of finding a representative set of nodes.

[1]  Karl Rihaczek,et al.  1. WHAT IS DATA MINING? , 2019, Data Mining for the Social Sciences.

[2]  Charles J. Colbourn,et al.  The Combinatorics of Network Reliability , 1987 .

[3]  Katia Obraczka,et al.  Multicast feedback suppression using representatives , 1997, Proceedings of INFOCOM '97.

[4]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[5]  Hannu Toivonen,et al.  Finding reliable subgraphs from large probabilistic graphs , 2008, Data Mining and Knowledge Discovery.

[6]  Matthew Richardson,et al.  Mining the network value of customers , 2001, KDD '01.

[7]  Hannu Toivonen,et al.  Link Discovery in Graphs Derived from Biological Databases , 2006, DILS.

[8]  Miroslav Kubat,et al.  Selecting representative examples and attributes by a genetic algorithm , 2003, Intell. Data Anal..

[9]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[10]  Christos Faloutsos,et al.  Center-piece subgraphs: problem definition and fast solutions , 2006, KDD '06.

[11]  P. Robinson,et al.  Walking the interactome for prioritization of candidate disease genes. , 2008, American journal of human genetics.

[12]  Ian Witten,et al.  Data Mining , 2000 .

[13]  Miguel Toro,et al.  Finding representative patterns with ordered projections , 2003, Pattern Recognit..

[14]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[15]  Ling Huang,et al.  Fast approximate spectral clustering , 2009, KDD.

[16]  Hans-Peter Kriegel,et al.  Knowledge Discovery in Large Spatial Databases: Focusing Techniques for Efficient Class Identification , 1995, SSD.

[17]  Tobias Kötter,et al.  From Information Networks to Bisociative Information Networks , 2012, Bisociative Knowledge Discovery.

[18]  Anthony K. H. Tung,et al.  Finding representative set from massive data , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[19]  Cheng Liang,et al.  Selection of representatives for feedback suppression in reliable multicast protocols , 2001 .

[20]  Leslie G. Valiant,et al.  The Complexity of Enumeration and Reliability Problems , 1979, SIAM J. Comput..