Summary Extraction on Data Streams in Embedded Systems

More and more data is created by humans and cyber-physical systems having sensing, acting and networking capabilities. Together, these systems form the Internet of Things (IoT). The realtime analysis of its data may provide us with valuable insights about the complex inner processes of the IoT. Moreover, these insights offer new opportunities ranging from sensor monitoring to actor control. The volume and velocity of the data at the distributed nodes challenge human as well as machine monitoring of the IoT. Broadcasting all measurements to a central node might exceed the network capacity as well as the resources at the central node or the human attention span. Hence, data should be reduced already at the local nodes such that the submitted information can be used for efficient monitoring. There are several methods that aim at data summarization ranging from clustering, aggregation to compression. Where most of the approaches transform the representation, we want to select unchanged data items from the data stream, already while they are generated by the cyberphysical system and at the cyber-physical system. The observations are selected independent of their frequencies. They are meant to be efficiently transmitted. The ideal case is that no important measurement is missing in the selection and that no redundant items are transmitted. The data summary is easily interpreted and is available in realtime. We focus on submodular function maximization due to its strong theoretical background. We investigate its use for data summarization and enhance the Sieve-Streaming algorithm for data summarization on data streams such that it delivers smaller sets with high recall.

[1]  Jeff A. Bilmes,et al.  Using Document Summarization Techniques for Speech Data Subset Selection , 2013, NAACL.

[2]  Silvio Borer,et al.  Normalization in Support Vector Machines , 2001, DAGM-Symposium.

[3]  Marco Stolpe,et al.  The Internet of Things: Opportunities and Challenges for Distributed Data Analysis , 2016, SIGKDD Explor..

[4]  Neil D. Lawrence,et al.  Fast Sparse Gaussian Process Methods: The Informative Vector Machine , 2002, NIPS.

[5]  Andreas Krause,et al.  Submodular Function Maximization , 2014, Tractability.

[6]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[7]  Adolfo Martínez Usó,et al.  UJIIndoorLoc: A new multi-building and multi-floor database for WLAN fingerprint-based indoor localization problems , 2014, 2014 International Conference on Indoor Positioning and Indoor Navigation (IPIN).

[8]  Baharan Mirzasoleiman,et al.  Fast Constrained Submodular Maximization: Personalized Data Summarization , 2016, ICML.

[9]  Bernard P. Brooks,et al.  The coefficients of the characteristic polynomial in terms of the eigenvalues and the elements of an n×n matrix , 2006, Applied Mathematics Letters.

[10]  Hui Lin,et al.  A Class of Submodular Functions for Document Summarization , 2011, ACL.

[11]  Andreas Krause,et al.  Deletion-Robust Submodular Maximization: Data Summarization with "the Right to be Forgotten" , 2017, ICML.

[12]  Andreas Krause,et al.  Streaming submodular maximization: massive data summarization on the fly , 2014, KDD.

[13]  Rishabh K. Iyer,et al.  Learning Mixtures of Submodular Functions for Image Collection Summarization , 2014, NIPS.