Utilizing the Hive Mind - How to Manage Knowledge in Fully Distributed Environments

By 2020, the Internet of Things will consist of 26 Billion connected devices. All these devices will be collecting an innumerable amount of raw observations, for example, GPS positions or communication patterns. In order to benefit from this enormous amount of information, machine learning algorithms are used to derive knowledge from the gathered observations. This benefit can be increased further, if the devices are enabled to collaborate by sharing gathered knowledge. In a massively distributed environment, this is not an easy task, as the knowledge on each device can be very heterogeneous and based on a different amount of observations in diverse contexts. In this paper, we propose two strategies to route a query for specific knowledge to a device that can answer it with high confidence. To that end, we developed a confidence metric that takes the number and variance of the observations of a device into account. Our routing strategies are based on local routing tables that can either be learned from previous queries over time or actively maintained by interchanging knowledge models. We evaluated both routing strategies on real world and synthetic data. Our evaluations show that the knowledge retrieved by the presented approaches is up to $$96.7\%$$ as accurate as the global optimum.

[1]  Anand Sivasubramaniam,et al.  Semantic small world: an overlay network for peer-to-peer search , 2004, Proceedings of the 12th IEEE International Conference on Network Protocols, 2004. ICNP 2004..

[2]  Xing Xie,et al.  GeoLife: A Collaborative Social Networking Service among User, Location and Trajectory , 2010, IEEE Data Eng. Bull..

[3]  Greg Hamerly,et al.  Learning the k in k-means , 2003, NIPS.

[4]  Huan Liu,et al.  Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution , 2003, ICML.

[5]  Wei-Ying Ma,et al.  Understanding mobility based on GPS data , 2008, UbiComp.

[6]  Klaus Wehrle,et al.  Privacy in the Internet of Things: threats and challenges , 2014, Secur. Commun. Networks.

[7]  Divyakant Agrawal,et al.  Content-Based Similarity Search over Peer-to-Peer Systems , 2004, DBISP2P.

[8]  Wensheng Yin,et al.  Weighted k-Means Algorithm Based Text Clustering , 2009, 2009 International Symposium on Information Engineering and Electronic Commerce.

[9]  Jiajin Le,et al.  Reverse Nearest Neighbor Search in Peer-to-Peer Systems , 2006, FQAS.

[10]  Beng Chin Ooi,et al.  Supporting multi-dimensional range queries in peer-to-peer systems , 2005, Fifth IEEE International Conference on Peer-to-Peer Computing (P2P'05).

[11]  C. Cheng,et al.  A protocol to maintain a minimum spanning tree in a dynamic topology , 1988, SIGCOMM 1988.

[12]  Xing Xie,et al.  Mining interesting locations and travel sequences from GPS trajectories , 2009, WWW '09.

[13]  Zhichen Xu,et al.  pSearch: information retrieval in structured overlays , 2003, CCRV.

[14]  Robert Morris,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM 2001.

[15]  Kurt Rothermel,et al.  Distributed spectral cluster management: a method for building dynamic publish/subscribe systems , 2012, DEBS.

[16]  Márk Jelasity,et al.  PeerSim: A scalable P2P simulator , 2009, 2009 IEEE Ninth International Conference on Peer-to-Peer Computing.

[17]  Antony I. T. Rowstron,et al.  Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems , 2001, Middleware.

[18]  Kurt Rothermel,et al.  PLEROMA: a SDN-based high performance publish/subscribe middleware , 2014, Middleware.

[19]  Thad Starner,et al.  Using GPS to learn significant locations and predict movement across multiple users , 2003, Personal and Ubiquitous Computing.

[20]  Srinivasan Seshan,et al.  Mercury: supporting scalable multi-attribute range queries , 2004, SIGCOMM 2004.

[21]  Vladimir Krylov,et al.  Approximate nearest neighbor algorithm based on navigable small world graphs , 2014, Inf. Syst..

[22]  Manish Parashar,et al.  Squid: Enabling search in DHT-based systems , 2008, J. Parallel Distributed Comput..

[23]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM 2001.

[24]  Hector Garcia-Molina,et al.  One torus to rule them all: multi-dimensional queries in P2P systems , 2004, WebDB '04.

[25]  Qi Tian,et al.  Feature selection using principal feature analysis , 2007, ACM Multimedia.

[26]  Pavel Zezula,et al.  A Scalable Nearest Neighbor Search in P2P Systems , 2004, DBISP2P.

[27]  S. C. Johnson Hierarchical clustering schemes , 1967, Psychometrika.