Optimal and Near-optimal Policies of Sequential Decision Problems under Uncertainty
Abstract
In this dissertation, we study stochastic sequential decision problems with a discrete finite time horizon. We conduct substantial analyses on both optimal and near-optimal policies of such problems, and focus on the total reward collected by different sequential policies from different perspectives. The total reward collected by any sequential policy is a priori random, and for many of the policies we study, we are interested in: (i) the expected total reward they collect, (ii) the variance of their total reward, (iii) the limiting distribution of their total reward, and (iv) confidence intervals for their total reward. With these goals in mind, we consider different sequential decision problems that include the dynamic and stochastic knapsack problem with equal rewards, the problem of sequential monotone subsequence selection from a random sample, and a dynamic inventory control problem. For the dynamic and stochastic knapsack problem with equal rewards, we propose an adaptive heuristic policy and prove that its regret---the expected gap between the total reward collected by the proposed heuristic policy and that collected by the best offline solution---is at most logarithmically in the problem size. Furthermore, we characterize the variance asymptotics as well as the limiting distribution of the total reward collected by such heuristic. We show that the total reward collected by our heuristic policy shares the same variance asymptotics and the same limiting distribution with that collected by the optimal sequential policy. This is in contrast with the performance of other asymptotically optimal heuristics whose total rewards have larger regrets, larger variances, and different limiting distributions. We also discuss the equivalence between the problem of sequential monotone subsequence selection from a random sample and a special instance of the dynamic and stochastic knapsack problem with equal rewards. Such equivalence enables us to apply the above mentioned analyses and results to the sequential monotone subsequence selection problem, and establish a number of results of independent interest. Lastly, we study multiple finite horizon sequential decision problems that are faced in parallel by self-interested decision makers. The uncertainties they face could be correlated in an arbitrary unknown fashion. Our goal is to construct simultaneous confidence intervals for the total rewards collected by each decision maker, through the observations of the rewards they collect along the time horizon. We propose a data-driven bootstrap procedure that is asymptotically valid. The key feature of such procedure is that the number of decision makers is allowed to be much larger than the length of the time horizon. This corresponds to the high-dimensional regime in the statistics literature in which the dimension of the estimator is much larger than sample size. As a byproduct, such bootstrap procedure can be used to simultaneously test whether each of the decision makers is implementing certain specified policy. We use a dynamic inventory control problem and the sequential monotone subsequence selection problem to illustrate the effectiveness of our framework.
Type
Department
Description
Provenance
Subjects
Citation
Permalink
Citation
Xie, Xinchang (2020). Optimal and Near-optimal Policies of Sequential Decision Problems under Uncertainty. Dissertation, Duke University. Retrieved from https://hdl.handle.net/10161/20936.
Collections
Except where otherwise noted, student scholarship that was shared on DukeSpace after 2009 is made available to the public under a Creative Commons Attribution / Non-commercial / No derivatives (CC-BY-NC-ND) license. All rights in student work shared on DukeSpace before 2009 remain with the author and/or their designee, whose permission may be required for reuse.