Clustering Enormous and Uncertain Data Steam

Complex networks, such as biological, social, and communication networks, often entail uncertainty, and thus, can be modelled as probabilistic graphs. In a clustering problem one has to partition a set of elements into homogeneous and well-separated subsets. From a graph theoretic point of view, a cluster graph is a vertex-disjoint union of cliques. The clustering problem is the task of making fewest changes to the edge set of an input graph so that it becomes a cluster graph. Problems of Probabilistic Graph Clustering are Similar to the problem of clustering standard graphs. Probabilistic graph clustering has numerous applications, such as finding complexes in probabilistic protein-protein interaction (PPI) networks and discovering groups of users in affiliation networks. The Probabilistic Graph is generated using the Probability of occurrence of the nodes in the affiliated network. Deterministic Graph is generated depending upon the active nodes in affiliated network. The proposed system establishes a connection between our objective function and correlation clustering to propose practical approximation algorithms for Probabilistic Graph problem. A benefit of proposed approach is that the objective function is parameter-free. Therefore, the number of clusters is part of the output. The Proposed method discovers the correct number of clusters and identifies established protein relationships.