Temporal difference learning to detect unsafe system states

This paper proposes a general framework to detect unsafe states of a system whose basic realtime parameters are captured by multi-sensors. Our approach is to learn a danger level function which can be used to alert the users in advance of dangerous situations. The main challenge to this learning problem is the labelling issue, i.e., it is difficult to assign an objective danger level at each time step to the training data, except at the collapse points where a penalty can be assigned and at the successful ends where a certain reward can be assigned. In this paper, we treat the danger level as expected future reward (penalty is regarded as negative reward) and use temporal difference (TD) learning [2] to learn a function to approximate the expected future reward. The TD learning obtains the approximation by propagating the penalty/reward observable at collapse points or successful ends to the entire feature space following some constraints. Our approach is applied to, but not limited to, the application of monitoring of driving safety and the experimental results demonstrate the effectiveness of the approach.

[1]  Thomas Lotze A Wavelet-based Anomaly Detector for Early Detection of Disease Outbreaks , 2006 .

[2]  Andrew W. Moore,et al.  Rule-based anomaly pattern detection for detecting disease outbreaks , 2002, AAAI/IAAI.

[3]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[4]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[5]  Tim Menzies,et al.  Bayesian Anomaly Detection , 2006 .

[6]  Miguel Ángel Sotelo,et al.  Real-time system for monitoring driver vigilance , 2004, Proceedings of the IEEE International Symposium on Industrial Electronics, 2005. ISIE 2005..

[7]  Milos Hauskrecht,et al.  Towards a Learning Trac Incident Detection System , 2006 .

[8]  Jennifer Healey,et al.  Detecting stress during real-world driving tasks using physiological sensors , 2005, IEEE Transactions on Intelligent Transportation Systems.

[9]  Haifeng Chen,et al.  Discovering likely invariants of distributed transaction systems for autonomic system management , 2006, 2006 IEEE International Conference on Autonomic Computing.

[10]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..