Ensembling and Coordination Methods for Sequential Decision Making Under Uncertainty

Lawson, Joseph Michael

Ensembling and Coordination Methods for Sequential Decision Making Under Uncertainty

dc.contributor.advisor	Laber, Eric
dc.contributor.author	Lawson, Joseph Michael
dc.date.accessioned	2025-01-08T17:44:54Z
dc.date.issued	2024
dc.department	Statistical Science
dc.description.abstract	Sequential decision making under uncertainty is ubiquitous in statistics and inhuman affairs more broadly. When observations are affected by the decisions an agent makes, special care is required in formulating algorithms that perform well. Adaptive clinical trials are a classic example and highlight the importance of this field. Relatedly, the burgeoning field of mobile health exhibits this structure. This work addresses the field of bandit modeling, providing methods for ensembling bandit algorithms and for coordinating simultaneous decisions with bandit feedback. We provide both theoretical and empirical justifications for our methods in the form of simulation studies, demonstrating favorable empirical performance against EXP4 and CORRAL in the case of our ensembling techniques and against BaSE in the case of our coordination algorithm. Our ensembling methods work by formulating a bandit model as a Markov Decision Process and using policy improvement techniques from reinforcement learning to achieve policy improvement results. We provide numerous Bayesian and frequentist characterization of performance. As part of implementing our proposed algorithms, we proposes and prove consistency for a novel inverse probability weighted kernel dynamic programming estimator for Markov Decision Processes that allows for adaptively collected data, which can improve efficiency in expensive simulation environments such as our. Our ensembling methods can be especially useful in settings with short horizons and expensive observations, such as planning site selection over time. This is because each decision is costly, and computationally intensive planning methods such as ours are justified. We also provide versions of our methods that our deterministic, which is important for many stakeholders who do not wish to employ randomized algorithms. Our coordination algorithm proposes a "MaxiMin" variant of the epsilon-Greedy algorithm which performs favorably in simultaneous decision settings. We prove a frequentist regret guarantee, further validating our approach. Clinical trials can occur in "batches," where treatment decisions must be made for a cohort of participants simultaneously. In settings with delayed treatment response this can also be effectively true. It is thus crucial to effectively coordinate these simultaneous decisions to improve the gain in information.
dc.identifier.uri	https://hdl.handle.net/10161/31961
dc.rights.uri	https://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subject	Statistics
dc.subject	Bandits
dc.subject	Bayesian
dc.subject	Reinforcement Learning
dc.title	Ensembling and Coordination Methods for Sequential Decision Making Under Uncertainty
dc.type	Dissertation
duke.embargo.months	20
duke.embargo.release	2026-09-08T17:44:54Z

Collections

Dissertations

Ensembling and Coordination Methods for Sequential Decision Making Under Uncertainty

Files

Collections