Ensembling and Coordination Methods for Sequential Decision Making Under Uncertainty

dc.contributor.advisor

Laber, Eric

dc.contributor.author

Lawson, Joseph Michael

dc.date.accessioned

2025-01-08T17:44:54Z

dc.date.issued

2024

dc.department

Statistical Science

dc.description.abstract

Sequential decision making under uncertainty is ubiquitous in statistics and inhuman affairs more broadly. When observations are affected by the decisions an agent makes, special care is required in formulating algorithms that perform well. Adaptive clinical trials are a classic example and highlight the importance of this field. Relatedly, the burgeoning field of mobile health exhibits this structure. This work addresses the field of bandit modeling, providing methods for ensembling bandit algorithms and for coordinating simultaneous decisions with bandit feedback. We provide both theoretical and empirical justifications for our methods in the form of simulation studies, demonstrating favorable empirical performance against EXP4 and CORRAL in the case of our ensembling techniques and against BaSE in the case of our coordination algorithm. Our ensembling methods work by formulating a bandit model as a Markov Decision Process and using policy improvement techniques from reinforcement learning to achieve policy improvement results. We provide numerous Bayesian and frequentist characterization of performance. As part of implementing our proposed algorithms, we proposes and prove consistency for a novel inverse probability weighted kernel dynamic programming estimator for Markov Decision Processes that allows for adaptively collected data, which can improve efficiency in expensive simulation environments such as our. Our ensembling methods can be especially useful in settings with short horizons and expensive observations, such as planning site selection over time. This is because each decision is costly, and computationally intensive planning methods such as ours are justified. We also provide versions of our methods that our deterministic, which is important for many stakeholders who do not wish to employ randomized algorithms. Our coordination algorithm proposes a "MaxiMin" variant of the epsilon-Greedy algorithm which performs favorably in simultaneous decision settings. We prove a frequentist regret guarantee, further validating our approach. Clinical trials can occur in "batches," where treatment decisions must be made for a cohort of participants simultaneously. In settings with delayed treatment response this can also be effectively true. It is thus crucial to effectively coordinate these simultaneous decisions to improve the gain in information.

dc.identifier.uri

https://hdl.handle.net/10161/31961

dc.rights.uri

https://creativecommons.org/licenses/by-nc-nd/4.0/

dc.subject

Statistics

dc.subject

Bandits

dc.subject

Bayesian

dc.subject

Reinforcement Learning

dc.title

Ensembling and Coordination Methods for Sequential Decision Making Under Uncertainty

dc.type

Dissertation

duke.embargo.months

20

duke.embargo.release

2026-09-08T17:44:54Z

Files

Collections