Ensembling and Coordination Methods for Sequential Decision Making Under Uncertainty
Date
2024
Authors
Advisors
Journal Title
Journal ISSN
Volume Title
Repository Usage Stats
views
downloads
Abstract
Sequential decision making under uncertainty is ubiquitous in statistics and inhuman affairs more broadly. When observations are affected by the decisions an agent makes, special care is required in formulating algorithms that perform well. Adaptive clinical trials are a classic example and highlight the importance of this field. Relatedly, the burgeoning field of mobile health exhibits this structure. This work addresses the field of bandit modeling, providing methods for ensembling bandit algorithms and for coordinating simultaneous decisions with bandit feedback. We provide both theoretical and empirical justifications for our methods in the form of simulation studies, demonstrating favorable empirical performance against EXP4 and CORRAL in the case of our ensembling techniques and against BaSE in the case of our coordination algorithm. Our ensembling methods work by formulating a bandit model as a Markov Decision Process and using policy improvement techniques from reinforcement learning to achieve policy improvement results. We provide numerous Bayesian and frequentist characterization of performance. As part of implementing our proposed algorithms, we proposes and prove consistency for a novel inverse probability weighted kernel dynamic programming estimator for Markov Decision Processes that allows for adaptively collected data, which can improve efficiency in expensive simulation environments such as our. Our ensembling methods can be especially useful in settings with short horizons and expensive observations, such as planning site selection over time. This is because each decision is costly, and computationally intensive planning methods such as ours are justified. We also provide versions of our methods that our deterministic, which is important for many stakeholders who do not wish to employ randomized algorithms. Our coordination algorithm proposes a "MaxiMin" variant of the epsilon-Greedy algorithm which performs favorably in simultaneous decision settings. We prove a frequentist regret guarantee, further validating our approach. Clinical trials can occur in "batches," where treatment decisions must be made for a cohort of participants simultaneously. In settings with delayed treatment response this can also be effectively true. It is thus crucial to effectively coordinate these simultaneous decisions to improve the gain in information.
Type
Department
Description
Provenance
Citation
Permalink
Citation
Lawson, Joseph Michael (2024). Ensembling and Coordination Methods for Sequential Decision Making Under Uncertainty. Dissertation, Duke University. Retrieved from https://hdl.handle.net/10161/31961.
Collections
Except where otherwise noted, student scholarship that was shared on DukeSpace after 2009 is made available to the public under a Creative Commons Attribution / Non-commercial / No derivatives (CC-BY-NC-ND) license. All rights in student work shared on DukeSpace before 2009 remain with the author and/or their designee, whose permission may be required for reuse.