An Efficient Architecture for Information Retrieval in P2P Context Using Hypergraph

Peer-to-peer (P2P) Data-sharing systems now generate a significant portion of Internet traffic. P2P systems have emerged as an accepted way to share enormous volumes of data. Needs for widely distributed information systems supporting virtual organizations have given rise to a new category of P2P systems called schema-based. In such systems each peer is a database management system in itself, ex-posing its own schema. In such a setting, the main objective is the efficient search across peer databases by processing each incoming query without overly consuming bandwidth. The usability of these systems depends on successful techniques to find and retrieve data; however, efficient and effective routing of content-based queries is an emerging problem in P2P networks. This work was attended as an attempt to motivate the use of mining algorithms in the P2P context may improve the significantly the efficiency of such methods. Our proposed method based respectively on combination of clustering with hypergraphs. We use ECCLAT to build approximate clustering and discovering meaningful clusters with slight overlapping. We use an algorithm MTMINER to extract all minimal transversals of a hypergraph (clusters) for query routing. The set of clusters improves the robustness in queries routing mechanism and scalability in P2P Network. We compare the performance of our method with the baseline one considering the queries routing problem. Our experimental results prove that our proposed methods generate impressive levels of performance and scalability with with respect to important criteria such as response time, precision and recall.

[1]  Coniferous softwood GENERAL TERMS , 2003 .

[2]  Bruno Crémilleux,et al.  A Data Mining Formalization to Improve Hypergraph Minimal Transversal Computation , 2007, Fundam. Informaticae.

[3]  Jie Lu,et al.  Content-based retrieval in hybrid peer-to-peer networks , 2003, CIKM '03.

[4]  Gerhard Weikum,et al.  Approximate Information Filtering in Peer-to-Peer Networks , 2008, WISE.

[5]  Karl Aberer,et al.  P-Grid: A Self-Organizing Access Structure for P2P Information Systems , 2001, CoopIS.

[6]  Paolo Manghi,et al.  XPeer: A Self-Organizing XML P2P Database System , 2004, EDBT Workshops.

[7]  Manolis Koubarakis,et al.  Publish/subscribe functionality in IR environments using structured overlay networks , 2005, SIGIR '05.

[8]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM '01.

[9]  Hector Garcia-Molina,et al.  Routing indices for peer-to-peer systems , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.

[10]  A. Rowstron,et al.  Scalable, decentralized object location and routing for large-scale peer-to-peer systems , 2001 .

[11]  Ran Wolff,et al.  Distributed Decision‐Tree Induction in Peer‐to‐Peer Systems , 2008, Stat. Anal. Data Min..

[12]  Felix Naumann,et al.  Semantic Overlay Clusters within Super-Peer Networks , 2003, DBISP2P.

[13]  Gerhard Weikum,et al.  MINERVA: Collaborative P2P Search , 2005, VLDB.

[14]  Steffen Staab,et al.  Bibster - A Semantics-Based Bibliographic Peer-to-Peer System , 2004, SEMWEB.

[15]  Steffen Staab,et al.  Remindin': semantic query routing in peer-to-peer networks based on social metaphors , 2004, WWW '04.

[16]  Norbert Fuhr,et al.  A Decision-Theoretic Model for Decentralised Query Routing in Hierarchical Peer-to-Peer Networks , 2007, ECIR.

[17]  Jie Lu,et al.  Federated Search of Text-Based Digital Libraries in Hierarchical Peer-to-Peer Networks , 2005, Workshop on Peer-to-Peer Information Retrieval.

[18]  J. Bailey,et al.  Fast Discovery of Interesting Collections of Web Services , 2006, 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings)(WI'06).

[19]  Georg Gottlob,et al.  Hypergraph Transversal Computation and Related Problems in Logic and AI , 2002, JELIA.

[20]  Bruno Crémilleux,et al.  Visualizing Transactional Data with Multiple Clusterings for Knowledge Discovery , 2006, ISMIS.

[21]  Patrick Valduriez,et al.  Reducing network traffic in unstructured P2P systems using Top-k queries , 2006, Distributed and Parallel Databases.

[22]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[23]  Ran Wolff,et al.  Local L2-Thresholding Based Data Mining in Peer-to-Peer Systems , 2006, SDM.

[24]  Hillol Kargupta,et al.  K-Means Clustering Over a Large, Dynamic Network , 2006, SDM.

[25]  Sandhya Dwarkadas,et al.  Peer-to-peer information retrieval using self-organizing semantic overlay networks , 2003, SIGCOMM '03.

[26]  Odysseas Papapetrou Full-text indexing and information retrieval in P2P systems , 2008, Ph.D. '08.

[27]  Bijan Raahemi,et al.  Peer-to-Peer IP Traffic Classification Using Decision Tree and IP Layer Attributes , 2007, Int. J. Bus. Data Commun. Netw..

[28]  Richard P. Martin,et al.  PlanetP: using gossiping to build content addressable peer-to-peer information sharing communities , 2003, High Performance Distributed Computing, 2003. Proceedings. 12th IEEE International Symposium on.

[29]  Ronald Fagin,et al.  Combining Fuzzy Information from Multiple Systems , 1999, J. Comput. Syst. Sci..

[30]  Karl Aberer,et al.  GridVine: Building Internet-Scale Semantic Overlay Networks , 2004, SEMWEB.

[31]  Zhe Yang,et al.  Efficient content location based on interest-cluster in peer-to-peer system , 2005, IEEE International Conference on e-Business Engineering (ICEBE'05).

[32]  Partha Dasgupta,et al.  EFFICIENT DISCOVERY OF IMPLICITLY FORMED PEER-TO-PEER COMMUNITIES # , 2002 .

[33]  Peter Triantafillou,et al.  PastryStrings: A Comprehensive Content-Based Publish/Subscribe DHT Network , 2006, 26th IEEE International Conference on Distributed Computing Systems (ICDCS'06).

[34]  Natalya F. Noy,et al.  Semantic integration: a survey of ontology-based approaches , 2004, SGMD.

[35]  Dimitrios Gunopulos,et al.  Data mining, hypergraph transversals, and machine learning (extended abstract) , 1997, PODS.

[36]  James Bailey,et al.  A fast algorithm for computing hypergraph transversals and its application in mining emerging patterns , 2003, Third IEEE International Conference on Data Mining.

[37]  A. Schuster,et al.  Association rule mining in peer-to-peer systems , 2004, IEEE Trans. Syst. Man Cybern. Part B.

[38]  D. DeWitt,et al.  GALANX: An Efficient Peer-to-Peer Search Engine System , 2004 .

[39]  C. Lee Giles,et al.  Efficient identification of Web communities , 2000, KDD '00.

[40]  Isabel F. Cruz,et al.  Peer-to-Peer Semantic Integration of XML and RDF Data Sources , 2004, AP2PC.

[41]  Shixiong Xia,et al.  A Correlation-Based Clustering Hierarchical P2P Network Model , 2010, 2010 International Conference on Internet Technology and Applications.

[42]  Silvana Castano,et al.  Semantically routing queries in peer-based systems: the H-Link approach , 2008, The Knowledge Engineering Review.

[43]  Elias C. Stavropoulos,et al.  Journal of Graph Algorithms and Applications an Efficient Algorithm for the Transversal Hypergraph Generation , 2022 .

[44]  Nicolas Pasquier,et al.  Efficient Mining of Association Rules Using Closed Itemset Lattices , 1999, Inf. Syst..

[45]  Edith Cohen,et al.  Associative search in peer to peer networks: harnessing latent semantics , 2003, IEEE INFOCOM 2003. Twenty-second Annual Joint Conference of the IEEE Computer and Communications Societies (IEEE Cat. No.03CH37428).

[46]  Tore Risch,et al.  EDUTELLA: a P2P networking infrastructure based on RDF , 2002, WWW.

[47]  Sadiq M. Sait,et al.  International Journal of Computer Networks & Communications (IJCNC) , 2011 .

[48]  Patrick Valduriez,et al.  Semantic Query Routing in SenPeer, a P2P Data Management System , 2007, NBiS.

[49]  Antony I. T. Rowstron,et al.  Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems , 2001, Middleware.

[50]  Silvana Castano,et al.  Semantic Self-Formation of Communities of Peers , 2005 .

[51]  Nicolas Durand,et al.  ECCLAT: a New Approach of Clusters Discovery in Categorical Data , 2003 .

[52]  Zhen Zhang Approach to Construct Cluster in Unstructured P2P Networks Based on Small-World Theory , 2010, 2010 Third International Symposium on Information Processing.

[53]  Yew-Kwong Woon,et al.  Association Rule Mining , 2009, Encyclopedia of Data Warehousing and Mining.

[54]  Patrick Valduriez,et al.  Data Management in Large-Scale P2P Systems , 2004, VECPAR.

[55]  Jinyan Li,et al.  Mining border descriptions of emerging patterns from dataset pairs , 2005, Knowledge and Information Systems.

[56]  Reza Akbarinia,et al.  Data management in the APPA P2P system , 2006 .

[57]  Ran Wolff,et al.  Distributed Decision-Tree Induction in Peer-to-Peer Systems , 2008 .