Soft control of human physiological signals by reinforcement learning

This paper shows a reinforcement learning model, that is mainly based on the adjustment of probabilistic transitions among states on a competitive way. The application controls the physiologic signal GSR (galvanic skin resistance) using musical stimulation by the measurement of GSC (galvanic skin conductance), its inverse. The GSC signal have been studied as a measure of human physical tension. We have studied methods from reinforcement learning field and from competitive learning, on the search for system adaptation towards low galvanic conductance. We use a matrix (dice) musical structure where a transition from any cell on a column to any cell in the next column is valid, in such way that music remains continuous, that is, transitions are not sensed. The last column can be combined with the first column in such way that we have an endless source of music. The agent goal is to find musical sequences that give lower GSC values. Note that this is a case of nonstationary environment since the preference of musical sequences changes over time. The experiments, so far, have shown the desired effect (the decrease of GSC) by 30% when music is controlled intelligently by the reinforcement learning agent compared with the agent that performs music by random choices.

[1]  G. Tesauro Practical Issues in Temporal Difference Learning , 1992 .

[2]  Arthur L. Samuel,et al.  Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..

[3]  Arthur L. Samuel,et al.  Some studies in machine learning using the game of checkers , 2000, IBM J. Res. Dev..

[4]  P. Venables,et al.  Direct measurement of skin conductance: a proposal for standardization. , 1971, Psychophysiology.

[5]  A. Ohman,et al.  Cardiovascular and electrodermal responses conditioned to fear-relevant stimuli. , 1979, Psychophysiology.

[6]  Teuvo Kohonen,et al.  Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.

[7]  A. L. Samuel,et al.  Some studies in machine learning using the game of checkers. II: recent progress , 1967 .

[8]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[9]  Stuart E. Dreyfus,et al.  Applied Dynamic Programming , 1965 .

[10]  Curtis Roads,et al.  The Computer Music Tutorial , 1996 .

[11]  Gerald Tesauro,et al.  TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[12]  Dimitri P. Bertsekas,et al.  Reinforcement Learning for Dynamic Channel Allocation in Cellular Telephone Systems , 1996, NIPS.

[13]  Gerald Tesauro,et al.  Temporal difference learning and TD-Gammon , 1995, CACM.

[14]  Teuvo Kohonen,et al.  The self-organizing map , 1990 .

[15]  C. Henrique,et al.  Aprendizado por Reforco , 1999 .

[16]  T. Kohonen Self-organized formation of topology correct feature maps , 1982 .

[17]  Stephan M. Schwanauer,et al.  Machine Models of Music , 1993 .