Ensembling and Coordination Methods for Sequential Decision Making Under Uncertainty

Loading...
Thumbnail Image
Limited Access
This item is unavailable until:
2026-09-08

Date

2024

Journal Title

Journal ISSN

Volume Title

Repository Usage Stats

29
views
0
downloads

Abstract

Sequential decision making under uncertainty is ubiquitous in statistics and inhuman affairs more broadly. When observations are affected by the decisions an agent makes, special care is required in formulating algorithms that perform well. Adaptive clinical trials are a classic example and highlight the importance of this field. Relatedly, the burgeoning field of mobile health exhibits this structure. This work addresses the field of bandit modeling, providing methods for ensembling bandit algorithms and for coordinating simultaneous decisions with bandit feedback. We provide both theoretical and empirical justifications for our methods in the form of simulation studies, demonstrating favorable empirical performance against EXP4 and CORRAL in the case of our ensembling techniques and against BaSE in the case of our coordination algorithm. Our ensembling methods work by formulating a bandit model as a Markov Decision Process and using policy improvement techniques from reinforcement learning to achieve policy improvement results. We provide numerous Bayesian and frequentist characterization of performance. As part of implementing our proposed algorithms, we proposes and prove consistency for a novel inverse probability weighted kernel dynamic programming estimator for Markov Decision Processes that allows for adaptively collected data, which can improve efficiency in expensive simulation environments such as our. Our ensembling methods can be especially useful in settings with short horizons and expensive observations, such as planning site selection over time. This is because each decision is costly, and computationally intensive planning methods such as ours are justified. We also provide versions of our methods that our deterministic, which is important for many stakeholders who do not wish to employ randomized algorithms. Our coordination algorithm proposes a "MaxiMin" variant of the epsilon-Greedy algorithm which performs favorably in simultaneous decision settings. We prove a frequentist regret guarantee, further validating our approach. Clinical trials can occur in "batches," where treatment decisions must be made for a cohort of participants simultaneously. In settings with delayed treatment response this can also be effectively true. It is thus crucial to effectively coordinate these simultaneous decisions to improve the gain in information.

Description

Provenance

Citation

Citation

Lawson, Joseph Michael (2024). Ensembling and Coordination Methods for Sequential Decision Making Under Uncertainty. Dissertation, Duke University. Retrieved from https://hdl.handle.net/10161/31961.

Collections


Except where otherwise noted, student scholarship that was shared on DukeSpace after 2009 is made available to the public under a Creative Commons Attribution / Non-commercial / No derivatives (CC-BY-NC-ND) license. All rights in student work shared on DukeSpace before 2009 remain with the author and/or their designee, whose permission may be required for reuse.