Transfer Learning in Value-based Methods with Successor Features
Repository Usage Stats
This dissertation investigates the concept of transfer learning in a reinforcement learning (RL) context. Transfer learning is based on the idea that it is possible for an agent to use what it has learned in one task to improve the learning process in another task as compared to learning from scratch. This improvement can take multiple forms, such as reducing the number of samples required to reach a given level of performance or even increasing the best performance achieved. In particular, we examine properties and applications of successor features, which are a useful representation that allows efficient calculation of action-value functions for a given policy in different contexts.
Our first contribution is a method for incremental construction of a cache of policies for a family of tasks. When a family of tasks share transition dynamics but differ in reward function, successor features allow us to efficiently compute the action-value functions for known policies in new tasks. As the optimal policy for a new task might be the same as or similar to that for a previous task, it is not always necessary for an agent to learn a new policy for each new task it encounters, especially if it is allowed some amount of suboptimality. We present new bounds for the performance of optimal policies in a new task, as well as an approach to use these bounds to decide, when presented with a new task, whether to use cached policies or learn a new policy.
In our second contribution, we examine the problem of hierarchical reinforcement learning, which involves breaking a task down into smaller subtasks which are easier to solve, through the lens of transfer learning. Within a single task, a subtask may encapsulate a behavior which could be used multiple times for completing the task, but occur in different contexts, such as opening doors while navigating a building. When the reward function changes between tasks, a given subtask may be unaffected, i.e., the optimal behavior within that subtask may remain the same. If so, the behavior may be immediately reused to accelerate training of behaviors for other subtasks. In both of these cases, reusing the learned behavior can be viewed as a transfer learning problem. We introduce a method based on the MAXQ value function decomposition which uses two applications of successor features to facilitate both transfer within a task and transfer between tasks with different reward functions.
The final contribution of this dissertation introduces a method for transfer using a value-based approach in domains with continuous actions. When an environment's action space is continuous, finding the action which maximizes an action-value function approximator efficiently often requires defining a constrained approximator which results in suboptimal behavior. Recently the RBF-DQN approach was proposed to use deep radial-basis value functions to allow efficient maximization of an action-value approximator over the actions while not losing the universal approximator property of neural networks. We present a method which extends this approach to use successor features in order to allow for effective transfer learning between tasks which differ in reward function.
Nemecek, Mark William (2023). Transfer Learning in Value-based Methods with Successor Features. Dissertation, Duke University. Retrieved from https://hdl.handle.net/10161/29204.
Dukes student scholarship is made available to the public using a Creative Commons Attribution / Non-commercial / No derivative (CC-BY-NC-ND) license.