Summary Creation for Information Discovery in Distributed Systems

In current distributed systems, such as Grids, Clouds, or P2P systems, the amount of information to handle influences the way the system is managed. In P2P systems containing large quantities of data, or in Grid systems containing a large number of (often heterogeneous) resources, information about data or resources must be spread through the system in an efficient way in order to allow them to be found. An information discovery technique based on data summarization, via clustering, is presented. These summaries can be used to classify information to provide users with greater insight about documents or computing resources compared to raw data. Also, meta-schedulers or brokers would benefit from the proposed technique due to the fact that they would have to deal with less data from resources, thus aiding to the scalability of the system. An evaluation of the approach is subsequently provided to identify the impact of choosing particular parameters to be used as part of the summary.

[1]  Maurice K. Wong,et al.  Algorithm AS136: A k-means clustering algorithm. , 1979 .

[2]  Christian Scheideler,et al.  Towards a Scalable and Robust DHT , 2006, SPAA '06.

[3]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[4]  Rajkumar Buyya,et al.  A toolkit for modelling and simulating data Grids: an extension to GridSim , 2008, Concurr. Comput. Pract. Exp..

[5]  Gerhard Weikum,et al.  Discovering and exploiting keyword and attribute-value co-occurrences to improve P2P routing indices , 2006, CIKM '06.

[6]  Ian T. Foster,et al.  Grid information services for distributed resource sharing , 2001, Proceedings 10th IEEE International Symposium on High Performance Distributed Computing.

[7]  Patrick Valduriez,et al.  Summary management in P2P systems , 2008, EDBT '08.

[8]  Douglas H. Fisher,et al.  Knowledge Acquisition Via Incremental Conceptual Clustering , 1987, Machine Learning.

[9]  Ian H. Witten,et al.  WEKA: a machine learning workbench , 1994, Proceedings of ANZIIS '94 - Australian New Zealnd Intelligent Information Systems Conference.

[10]  E. Rosch,et al.  Categorization of Natural Objects , 1981 .

[11]  Eduardo Huedo,et al.  A modular meta-scheduling architecture for interfacing with pre-WS and WS Grid resource management services , 2007, Future Gener. Comput. Syst..

[12]  Sabu M. Thampi,et al.  Survey of Search and Replication Schemes in Unstructured P2p Networks , 2010, Netw. Protoc. Algorithms.

[13]  Domenico Talia,et al.  Peer-to-Peer Models for Resource Discovery on Grids , 2006 .

[14]  Borja Sotomayor,et al.  Virtual Infrastructure Management in Private and Hybrid Clouds , 2009, IEEE Internet Computing.

[15]  Robert Wetzker,et al.  An Ontology-Based Approach to Text Summarization , 2008, 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[16]  Mark A. Gluck,et al.  Information, Uncertainty and the Utility of Categories , 1985 .

[17]  Y. Charlie Hu,et al.  A Self-Organizing Flock of Condors , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[18]  Eduardo Huedo,et al.  Performance‐based scheduling strategies for HTC applications in complex federated grids , 2010, Concurr. Comput. Pract. Exp..

[19]  Xiaohua Hu,et al.  A Coherent Biomedical Literature Clustering and Summarization Approach Through Ontology-Enriched Graphical Representations , 2006, DaWaK.