Peer-to-Peer Data Clustering in Self-Organizing Sensor Networks

This work proposes and evaluates distributed algorithms for data clustering in self-organizing ad-hoc sensor networks with computational, connectivity, and power constraints. Self-organization is essential in environments with a large number of devices, because the resulting system cannot be configured and maintained by specific human adjustments on its single components. One of the benefits of in-network data clustering algorithms is the capability of the network to transmit only relevant, high level information, namely models, instead of large amounts of raw data, also reducing drastically energy consumption. For instance, a sensor network could directly identify or anticipate extreme environmental events such as tsunami, tornado or volcanic eruptions notifying only the alarm or its probability, rather than transmitting via satellite each single normal wave motion. The efficiency and efficacy of the methods is evaluated by simulation measuring network traffic, and comparing the generated models with ideal results returned by density-based clustering algorithms for centralized systems. DOI: 10.4018/978-1-60566-328-9.ch009

[1]  Jiejun Kong,et al.  The challenges of building mobile underwater wireless networks for aquatic applications , 2006, IEEE Network.

[2]  D. W. Scott,et al.  Multivariate Density Estimation, Theory, Practice and Visualization , 1992 .

[3]  Vera L. Trainer,et al.  Harmful Algal Blooms in Coastal Upwelling Systems , 2005 .

[4]  Claudio Sartori,et al.  Distributed data clustering in multi-dimensional peer-to-peer networks , 2010, ADC.

[5]  Gianluca Moro,et al.  Self-organization and Local Learning Methods for Improving the Applicability and Efficiency of Data-Centric Sensor Networks , 2009, QSHINE.

[6]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[7]  Deborah Estrin,et al.  Directed diffusion for wireless sensor networking , 2003, TNET.

[8]  Jon Louis Bentley,et al.  Data Structures for Range Searching , 1979, CSUR.

[9]  Gianluca Moro,et al.  Multidimensional Range Query and Load Balancing in Wireless Ad Hoc and Sensor Networks , 2008, 2008 Eighth International Conference on Peer-to-Peer Computing.

[10]  Brad Karp,et al.  GPSR: greedy perimeter stateless routing for wireless networks , 2000, MobiCom '00.

[11]  Hans-Peter Kriegel,et al.  A distribution-based clustering algorithm for mining in large spatial databases , 1998, Proceedings 14th International Conference on Data Engineering.

[12]  Joemon M. Jose,et al.  An architecture for information retrieval over semi-collaborating Peer-to-Peer networks , 2004, SAC '04.

[13]  Anand Sivasubramaniam,et al.  PENS: an algorithm for density-based clustering in peer-to-peer systems , 2006, InfoScale '06.

[14]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[15]  Maurice K. Wong,et al.  Algorithm AS136: A k-means clustering algorithm. , 1979 .

[16]  Deborah Estrin,et al.  Data-Centric Storage in Sensornets with GHT, a Geographic Hash Table , 2003, Mob. Networks Appl..

[17]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[18]  Oliver Günther,et al.  Multidimensional access methods , 1998, CSUR.

[19]  M. Stojanovic,et al.  Underwater acoustic networks , 2000, IEEE Journal of Oceanic Engineering.

[20]  Matthias Klusch,et al.  Distributed Clustering Based on Sampling Local Density Estimates , 2003, IJCAI.

[21]  Aris M. Ouksel,et al.  G-Grid: A Class of Scalable and Self-Organizing Data Structures for Multi-dimensional Querying and Content Routing in P2P Networks , 2003, AP2PC.

[22]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .

[23]  Aris M. Ouksel,et al.  Routing and Localization Services in Self-Organizing Wireless Ad-Hoc and Sensor Networks Using Virtual Coordinates , 2006, 2006 ACS/IEEE International Conference on Pervasive Services.

[24]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM 2001.

[25]  Joemon M. Jose,et al.  Single-pass clustering for peer-to-peer information retrieval: the effect of document ordering , 2006, InfoScale '06.

[26]  Hillol Kargupta,et al.  Distributed Clustering Using Collective Principal Component Analysis , 2001, Knowledge and Information Systems.

[27]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[28]  M. Moline,et al.  OPTICAL MONITORING AND FORECASTING SYSTEMS FOR HARMFUL ALGAL BLOOMS: POSSIBILITY OR PIPE DREAM? , 1999 .

[29]  Jürg Nievergelt,et al.  The Grid File: An Adaptable, Symmetric Multikey File Structure , 1984, TODS.

[30]  Hanan Samet,et al.  The Quadtree and Related Hierarchical Data Structures , 1984, CSUR.

[31]  D. Milojicic,et al.  Peer-to-Peer Computing , 2010 .

[32]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[33]  Claudio Sartori,et al.  WR-Grid: A Scalable Cross-Layer Infrastructure for Routing, Multi-dimensional Data Management and Replication in Wireless Sensor Networks , 2006, ISPA Workshops.

[34]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[35]  Young-Jin Kim,et al.  Multi-dimensional range queries in sensor networks , 2003, SenSys '03.

[36]  Robert D. Nowak,et al.  Distributed EM algorithms for density estimation and clustering in sensor networks , 2003, IEEE Trans. Signal Process..

[37]  Joydeep Ghosh,et al.  Privacy-preserving distributed clustering using generative models , 2003, Third IEEE International Conference on Data Mining.

[38]  Dimitris K. Tasoulis,et al.  Unsupervised distributed clustering , 2004, Parallel and Distributed Computing and Networks.

[39]  Wei Hong,et al.  Proceedings of the 5th Symposium on Operating Systems Design and Implementation Tag: a Tiny Aggregation Service for Ad-hoc Sensor Networks , 2022 .

[40]  Mohammed J. Zaki,et al.  Large-Scale Parallel Data Mining , 2002, Lecture Notes in Computer Science.

[41]  Aris M. Ouksel,et al.  Tolerance of localization imprecision in efficiently managing mobile sensor databases , 2005, MobiDE '05.

[42]  Mark A. Moline,et al.  Bioinformatic approaches for objective detection of water masses on continental shelves , 2004 .

[43]  J. A. Cummings,et al.  Global and regional ocean thermal analysis systems at Fleet Numerical Meteorology and Oceanography Center , 1994, Proceedings of OCEANS'94.

[44]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[45]  Hector Garcia-Molina,et al.  One torus to rule them all: multi-dimensional queries in P2P systems , 2004, WebDB '04.

[46]  Daniel A. Keim,et al.  An Efficient Approach to Clustering in Large Multimedia Databases with Noise , 1998, KDD.

[47]  David W. Scott,et al.  Multivariate Density Estimation: Theory, Practice, and Visualization , 1992, Wiley Series in Probability and Statistics.

[48]  Hillol Kargupta,et al.  Collective, Hierarchical Clustering from Distributed, Heterogeneous Data , 1999, Large-Scale Parallel Data Mining.

[49]  Stefano Lodi,et al.  W*-Grid: A Robust Decentralized Cross-layer Infrastructure for Routing and Multi-Dimensional Data Management in Wireless Ad-Hoc Sensor Networks , 2007 .

[50]  Dario Pompili,et al.  Underwater acoustic sensor networks: research challenges , 2005, Ad Hoc Networks.

[51]  Craig A. Grimes,et al.  Design of a Wireless Sensor Network for Long-term, In-Situ Monitoring of an Aqueous Environment , 2002 .

[52]  B. Silverman Density estimation for statistics and data analysis , 1986 .

[53]  Markku Tamminen Comment on Quad- and Octtrees , 1984, CACM.

[54]  Keinosuke Fukunaga,et al.  A Graph-Theoretic Approach to Nonparametric Cluster Analysis , 1976, IEEE Transactions on Computers.

[55]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .