dc.description.abstract |
<p>As a growing number of agents are deployed in complex environments for scientific
research and human well-being, there are increasing demands for designing efficient
learning algorithms for these agents to improve their control polices. Such policies
must account for uncertainties, including those caused by environmental stochasticity,
sensor noise and communication restrictions. These challenges exist in missions such
as planetary navigation, forest firefighting, and underwater exploration. Ideally,
good control policies should allow the agents to deal with all the situations in an
environment and enable them to accomplish their mission within the budgeted time and
resources. However, a correct model of the environment is not typically available
in advance, requiring the policy to be learned from data. Model-free reinforcement
learning (RL) is a promising candidate for agents to learn control policies while
engaged in complex tasks, because it allows the control policies to be learned directly
from a subset of experiences and with time efficiency. Moreover, to ensure persistent
performance improvement for RL, it is important that the control policies be concisely
represented based on existing knowledge, and have the flexibility to accommodate new
experience. Bayesian nonparametric methods (BNPMs) both allow the complexity of models
to be adaptive to data, and provide a principled way for discovering and representing
new knowledge.</p><p>In this thesis, we investigate approaches for RL in centralized
and decentralized sequential decision-making problems using BNPMs. We show how the
control policies can be learned efficiently under model-free RL schemes with BNPMs.
Specifically, for centralized sequential decision-making, we study Q learning with
Gaussian processes to solve Markov decision processes, and we also employ hierarchical
Dirichlet processes as the prior for the control policy parameters to solve partially
observable Markov decision processes. For decentralized partially observable Markov
decision processes, we use stick-breaking processes as the prior for the controller
of each agent. We develop efficient inference algorithms for learning the corresponding
control policies. We demonstrate that by combining model-free RL and BNPMs with efficient
algorithm design, we are able to scale up RL methods for complex problems that cannot
be solved due to the lack of model knowledge. We adaptively learn control policies
with concise structure and high value, from a relatively small amount of data.</p>
|
|