Safety and sample efficiency are paramount issues for applying reinforcement learning (RL) algorithms in the real world. In this lecture series, we will consider different types of model-based RL algorithms with a focus on sample efficiency and safety. We start with a review of model-based algorithms and investigate sample-efficient algorithms that exploit the structure of the problem. The follow-up lectures review safety from two perspectives, one where a reasonable performance must be ensured and another where safety constraints cannot be violated. First, the RL agent only has access to a fixed dataset of past trajectories and does not interact directly with the environment. Assuming the behavior policy that collected the data is available, the challenge is to compute a policy that outperforms the behavior policy. In this setting, we revisit structured problems to improve the sample efficiency of these algorithms. Second, the RL agent is trained by interacting directly with the environment. However, there are safety constraints. Therefore, the agent cannot perform the typical random exploration from online RL algorithms. Assuming the agent has access to an abstraction of the safety dynamics, we investigate algorithms that can safely explore the environment and eventually converge to an optimal policy.
December 14, 8.30-10.30 (Lab Alfa)
December 15, 8.30-10.30 (Room C)
December 16, 8.30-10.30 (Room T0.5)
The minicourse is related to the "Reinforcement Learning" course (Master in Artificial Intelligence).
CSS e script comuni siti DOL - frase 9957