We present a differentially private actor and its eligibility trace in an actor-critic approach, wherein an actor takes actions directly interacting with an environment; however, the critic estimates only the state values that are obtained through bootstrapping. In other words, the actor reflects the more detailed information about the sequence of taken actions on its parameter than the critic. Moreover, their corresponding eligibility traces have the same properties. Therefore, it is necessary to preserve the privacy of an actor and its eligibility trace while training on private or sensitive data. In this paper, we confirm the applicability of differential privacy methods to the actors updated using the policy gradient algorithm and discuss the advantages of such an approach with regard to differentially private critic learning. In addition, we measured the cosine similarity between the differentially private applied eligibility trace and the non-differentially private eligibility trace to analyze whether their anonymity is appropriately protected in the differentially private actor or the critic. We conducted the experiments considering two synthetic examples imitating real-world problems in medical and autonomous navigation domains, and the results confirmed the feasibility of the proposed method.
[1]
Thomas G. Dietterich.
Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition
,
1999,
J. Artif. Intell. Res..
[2]
Vijay R. Konda,et al.
OnActor-Critic Algorithms
,
2003,
SIAM J. Control. Optim..
[3]
Cynthia Dwork,et al.
Differential Privacy
,
2006,
ICALP.
[4]
Aldo A. Faisal,et al.
The Artificial Intelligence Clinician learns optimal treatment strategies for sepsis in intensive care
,
2018,
Nature Medicine.
[5]
Anand D. Sarwate,et al.
Differentially Private Empirical Risk Minimization
,
2009,
J. Mach. Learn. Res..