Risk-Averse and Distributionally Robust Optimization for Online and Offline Decision-Making

Loading...

Date

2025

Authors

Journal Title

Journal ISSN

Volume Title

Repository Usage Stats

0
views
4
downloads

Attention Stats

Abstract

The deployment of machine learning models in high-stakes domains, such as healthcare, calls for algorithms that not only optimize the expected performance, but also account for risks and uncertainties. This dissertation develops risk-sensitive optimization frameworks tailored for environments confronted with uncertainty and distribution shifts in both online and offline settings.

In the first study, we address the challenge of online stochastic games with risk-averse agents whose objective is to minimize the risk of incurring significantly high costs, measured by the Conditional Value at Risk (CVaR). The primary difficulty stems from the unknown distributions of the cost functions that depend on the unobservable actions of all agents. To overcome this, we propose a novel online risk-averse learning algorithm that utilizes a one-point zeroth-order estimation of the CVaR gradients. Our theoretical analysis demonstrates that this algorithm achieves sublinear regret with high probability. We further enhance performance through two innovative variants: one employing a new sampling strategy that leverages samples from the previous iteration to improve CVaR estimation accuracy, and another utilizing residual feedback to reduce variance in gradient estimates. The practical efficacy of these approaches is validated through extensive simulations of an online market problem modeled as a Cournot game.

Next, we explore risk-averse CVaR multi-armed bandit (MAB) problems, formulated as a transfer learning problem between an expert and a learner agent. We specifically address scenarios where contexts observable to the expert act as unobserved confounders from the learner's perspective. We first develop a mixed-integer linear program that utilizes expert data to establish causal bounds on the CVaR of true returns for all possible unobserved confounders. We then transfer these causal bounds to the learner through a novel causal bound constrained Upper Confidence Bound (UCB) algorithm. This approach significantly reduces the exploration variance, enabling faster identification of the minimum-risk arm with fewer samples. Our regret analysis shows that the algorithm can achieve zero or constant regret under appropriate conditions. The effectiveness of the methodology is demonstrated through an emotion regulation application in mobile health, where our approach consistently outperforms standard risk-averse MAB methods that lack causal bound integration.

In our third study, we tackle the critical challenge of off-policy evaluation and learning when the environment during data collection differs from that during policy execution. We introduce a distributionally robust optimization (DRO) framework based on the Wasserstein distance, addressing the fundamental limitations of traditional KL-divergence approaches that fail to account for distributions with out-of-sample support and lack of awareness of support geometry. While Wasserstein DRO typically incurs higher computational costs, we develop a regularized method and a practical stochastic gradient descent approach that solves the DRO problem efficiently. Our theoretical analysis provides finite sample complexity and iteration complexity guarantees of the proposed method. We further validate our approach using a public dataset that wasrecorded in a randomized stoke trial.

Finally, we propose a DRO framework that relieson a new distance inspired by Unbalanced Optimal Transport (UOT). The proposed UOT distance employs a soft penalization term instead of hard constraints, enabling the construction of an ambiguity set that is more resilient to outliers. Under smoothness conditions, we establish strong duality of the proposed DRO problem. Moreover, we introduce a computationally efficient Lagrangian penalty formulation for which we show that strong duality also holds. We provide empirical results that demonstrate that our method offers improved robustness to outliers and is computationally less demanding for regression and classification tasks.

These studies provide approaches for addressing uncertainty in various decision-making contexts, bridging theoretical advances with practical implementations, and providing practitioners with robust tools for high-stakes applications, including online markets and healthcare systems.

All the contents presented in this dissertation have been previously published in peer-reviewed venues.

Description

Provenance

Subjects

Artificial intelligence, Computer science, Engineering, Contextual Bandits, Distributionally Robust Optimization, Online Learning, Optimal Transport, Risk-averse Optimization

Citation

Citation

Shen, Yi (2025). Risk-Averse and Distributionally Robust Optimization for Online and Offline Decision-Making. Dissertation, Duke University. Retrieved from https://hdl.handle.net/10161/33317.

Collections


Except where otherwise noted, student scholarship that was shared on DukeSpace after 2009 is made available to the public under a Creative Commons Attribution / Non-commercial / No derivatives (CC-BY-NC-ND) license. All rights in student work shared on DukeSpace before 2009 remain with the author and/or their designee, whose permission may be required for reuse.