Client-side web mining for community formation in peer-to-peer environments

In this paper we present a framework for forming interests-based Peer-to-Peer communities using client-side web browsing history. At the heart of this framework is the use of an order statistics-based approach to build communities with hierarchical structure. We have also carefully considered privacy concerns of the peers and adopted cryptographic protocols to measure similarity between them without disclosing their personal profiles. We evaluated our framework on a distributed data mining platform we have developed. The experimental results show that our framework could effectively build interests-based communities.

[1]  Suresh Jagannathan,et al.  Distributed Uniform Sampling in Unstructured Peer-to-Peer Networks , 2006, Proceedings of the 39th Annual Hawaii International Conference on System Sciences (HICSS'06).

[2]  Bart Goethals,et al.  On Private Scalar Product Computation for Privacy-Preserving Data Mining , 2004, ICISC.

[3]  Natalya F. Noy,et al.  Semantic integration: a survey of ontology-based approaches , 2004, SGMD.

[4]  A. Schuster,et al.  Association rule mining in peer-to-peer systems , 2004, Third IEEE International Conference on Data Mining.

[5]  Julita Vassileva,et al.  Trust-Based Community Formation in Peer-to-Peer File Sharing Networks , 2004, IEEE/WIC/ACM International Conference on Web Intelligence (WI'04).

[6]  Alessandro Agostini,et al.  Identification of Communities of Peers by Trust and Reputation , 2004, AIMSA.

[7]  Rebecca N. Wright,et al.  Privacy-preserving Bayesian network structure computation on distributed heterogeneous data , 2004, KDD.

[8]  Ran Wolff,et al.  k-TTP: a new privacy model for large-scale distributed environments , 2004, KDD.

[9]  Susan Gauch,et al.  Improving Ontology-Based User Profiles , 2004, RIAO.

[10]  Dahlia Malkhi,et al.  Estimating network size from local information , 2003, Information Processing Letters.

[11]  Partha Dasgupta,et al.  EFFICIENT DISCOVERY OF IMPLICITLY FORMED PEER-TO-PEER COMMUNITIES # , 2002 .

[12]  Roy H. Campbell,et al.  Routing through the mist: privacy preserving communication in ubiquitous computing environments , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.

[13]  C. Lee Giles,et al.  Self-Organization and Identification of Web Communities , 2002, Computer.

[14]  Francisco Tanudjaja,et al.  Persona: a contextualized and personalized web search , 2002, Proceedings of the 35th Annual Hawaii International Conference on System Sciences.

[15]  Pascal Paillier,et al.  Public-Key Cryptosystems Based on Composite Degree Residuosity Classes , 1999, EUROCRYPT.

[16]  Gerhard Fischer,et al.  Using agents to personalize the Web , 1997, IUI '97.

[17]  John Scott Social Network Analysis , 1988 .

[18]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[19]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[20]  Hillol Kargupta,et al.  K-Means Clustering Over a Large, Dynamic Network , 2006, SDM.

[21]  Ran Wolff,et al.  Local L2-Thresholding Based Data Mining in Peer-to-Peer Systems , 2006, SDM.

[22]  Silvana Castano,et al.  Semantic Self-Formation of Communities of Peers , 2005 .

[23]  Yunghsiang Sam Han,et al.  Privacy-Preserving Multivariate Statistical Analysis: Linear Regression and Classification , 2004, SDM.

[24]  Stefan Saroiu,et al.  A Measurement Study of Peer-to-Peer File Sharing Systems , 2001 .

[25]  Paul Erd,et al.  Random Walks on Graphs: a Survey , 1993 .

[26]  A. Yao,et al.  Fair exchange with a semi-trusted third party (extended abstract) , 1997, CCS '97.