Intelligent instance selection of data streams for smart sensor applications

The purpose of our work is to mine streaming data from a variety of hundreds of automotive sensors in order to develop methods to minimize driver distraction from in-vehicle communications and entertainment systems such as audio/video devices, cellphones, PDAs, Fax, eMail, and other messaging devices. Our endeavor is to create a safer driving environment, by providing assistance in the form of warning, delaying, or re-routing, incoming signals if the assistance system detects that the driver is performing, or is about to perform, a critical maneuver, such as passing, changing lanes, making a turn, or during a sudden evasive maneuver. To accomplish this, our assistance system relies on maneuver detection by continuously evaluating various embedded vehicle sensors, such as speed, steering, acceleration, lane distance, and many others, combined into representing an instance of the “state” of the vehicle. One key issue is how to effectively and efficiently monitor many sensors with constant data streams. Data streams have their unique characteristics and may produce data that is not relevant or pertinent to a maneuver. We propose an adaptive sampling method that takes advantage of these unique characteristics and develop algorithms that attempt to select relevant and important instances to determine which sensors to monitor and how to provide quick and effective responses to this type of mission critical situations. This work can be extended to many similar sensor applications with data streams.

[1]  Shai Ben-David,et al.  Detecting Change in Data Streams , 2004, VLDB.

[2]  Ron Kohavi,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998 .

[3]  Eamonn J. Keogh,et al.  An online algorithm for segmenting time series , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[4]  D. Stott Parker,et al.  Landmark: A New Technique for Similarity-Based Pattern Querying in Time Series Databases , 2000 .

[5]  Piotr Indyk,et al.  Maintaining Stream Statistics over Sliding Windows , 2002, SIAM J. Comput..

[6]  Dimitrios Gunopulos,et al.  Online amnesic approximation of streaming time series , 2004, Proceedings. 20th International Conference on Data Engineering.

[7]  Haixun Wang,et al.  Landmarks: a new model for similarity-based pattern querying in time series databases , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[8]  Dennis Shasha,et al.  StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time , 2002, VLDB.

[9]  Renée J. Miller,et al.  Similarity search over time-series data using wavelets , 2002, Proceedings 18th International Conference on Data Engineering.

[10]  Eamonn J. Keogh,et al.  Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases , 2001, Knowledge and Information Systems.

[11]  S. Muthukrishnan,et al.  Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries , 2001, VLDB.

[12]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[13]  Huan Liu,et al.  Instance Selection and Construction for Data Mining , 2001 .

[14]  Nagiza F. Samatova,et al.  Reservoir-Based Random Sampling with Replacement from Data Stream , 2004, SDM.

[15]  Csaba D. Tóth,et al.  Adaptive Spatial Partitioning for Multidimensional Data Streams , 2004, Algorithmica.

[16]  Huan Liu,et al.  Sensor selection for maneuver classification , 2004, Proceedings. The 7th International IEEE Conference on Intelligent Transportation Systems (IEEE Cat. No.04TH8749).

[17]  Christos Faloutsos,et al.  AWSOM: Adaptive, Hands-Off Stream Mining , 2003 .

[18]  Sudipto Guha,et al.  XWAVE: optimal and approximate extended wavelets , 2004, VLDB 2004.

[19]  S. Muthukrishnan,et al.  One-Pass Wavelet Decompositions of Data Streams , 2003, IEEE Trans. Knowl. Data Eng..

[20]  Gurmeet Singh Manku,et al.  Approximate counts and quantiles over sliding windows , 2004, PODS.