Show simple item record

Efficient Bayesian Nonparametric Methods for Model-Free Reinforcement Learning in Centralized and Decentralized Sequential Environments

dc.contributor.advisor Carin, Lawrence Liu, Miao 2014-08-27T15:20:54Z 2015-08-22T04:30:04Z 2014
dc.description.abstract <p>As a growing number of agents are deployed in complex environments for scientific research and human well-being, there are increasing demands for designing efficient learning algorithms for these agents to improve their control polices. Such policies must account for uncertainties, including those caused by environmental stochasticity, sensor noise and communication restrictions. These challenges exist in missions such as planetary navigation, forest firefighting, and underwater exploration. Ideally, good control policies should allow the agents to deal with all the situations in an environment and enable them to accomplish their mission within the budgeted time and resources. However, a correct model of the environment is not typically available in advance, requiring the policy to be learned from data. Model-free reinforcement learning (RL) is a promising candidate for agents to learn control policies while engaged in complex tasks, because it allows the control policies to be learned directly from a subset of experiences and with time efficiency. Moreover, to ensure persistent performance improvement for RL, it is important that the control policies be concisely represented based on existing knowledge, and have the flexibility to accommodate new experience. Bayesian nonparametric methods (BNPMs) both allow the complexity of models to be adaptive to data, and provide a principled way for discovering and representing new knowledge.</p><p>In this thesis, we investigate approaches for RL in centralized and decentralized sequential decision-making problems using BNPMs. We show how the control policies can be learned efficiently under model-free RL schemes with BNPMs. Specifically, for centralized sequential decision-making, we study Q learning with Gaussian processes to solve Markov decision processes, and we also employ hierarchical Dirichlet processes as the prior for the control policy parameters to solve partially observable Markov decision processes. For decentralized partially observable Markov decision processes, we use stick-breaking processes as the prior for the controller of each agent. We develop efficient inference algorithms for learning the corresponding control policies. We demonstrate that by combining model-free RL and BNPMs with efficient algorithm design, we are able to scale up RL methods for complex problems that cannot be solved due to the lack of model knowledge. We adaptively learn control policies with concise structure and high value, from a relatively small amount of data.</p>
dc.subject Electrical engineering
dc.subject Computer engineering
dc.subject Bayeisan nonparametric methods
dc.subject Decentralized POMDP
dc.subject Finite state controller
dc.subject Gaussian process
dc.subject POMDP
dc.subject reinforcement learning
dc.title Efficient Bayesian Nonparametric Methods for Model-Free Reinforcement Learning in Centralized and Decentralized Sequential Environments
dc.type Dissertation
dc.department Electrical and Computer Engineering
duke.embargo.months 12

Files in this item


This item appears in the following Collection(s)

Show simple item record