Managing Shared Resources in the Data Center Era
To improve efficiency and amortize cost over more computation, resource
sharing has become vital in high performance computing systems. In such
systems, the conventional wisdom assumes that users have to share,
regardless of the management policy. With a wide range of computing options
available, this assumption does not seem to hold for today's self-interested
users. These users selfishly pursue their individual performance without
regard for others or the system. And if they dislike management outcomes,
they will withdraw from the shared system. If they decide to share, they
will try to game the management system by misreporting their resource
demands to improve their performance, perhaps at the expense of others in
the system. To address this challenge and study strategic behavior of
self-interested users, game theory is known to be an effective tool. Drawing
on game theory, this thesis encourages new thinking in designing management
platforms robust to strategic behavior. In this thesis, we present five
pieces of work on data center management platforms.
First, with the democratization of cloud and datacenter computing, users
increasingly share large hardware platforms. In this setting, architects
encounter two challenges: sharing fairly and sharing multiple resources.
Drawing on game theory, we rethink fairness in computer
architecture. A fair allocation must provide sharing incentives (SI),
envy-freeness (EF), and Pareto efficiency (PE). We show that Cobb-Douglas
utility functions are well suited to modeling user preferences for cache
capacity and memory bandwidth. Additionally we present an allocation mechanism that
uses Cobb-Douglas preferences to determine each user's fair share of the
hardware. This mechanism provably guarantees SI, EF, and PE, as well as
strategy-proofness in the large (SPL). And it does so with modest
performance penalties, less than 10% throughput loss, relative to an unfair
Second, computational sprinting is a class of mechanisms that boost
performance but dissipate additional power. We describe a sprinting
architecture in which many independent chip multiprocessors share a power
supply and sprints are constrained by the chips' thermal limits and the
rack's power limits. Moreover, we present the computational sprinting game,
a multi-agent perspective on managing sprints. Strategic agents decide
whether to sprint based on application phases and system conditions. The
game produces an equilibrium that improves task throughput for data
analytics workloads by 4-6x over prior greedy heuristics and performs
within 90% of an upper bound on throughput from a globally optimized
Third, ensuring fairness in a system with scarce and commonly preferred resources
requires time sharing. We consider a heterogeneous system with a few
``big'' and many ``small'' processors. We allocate heterogeneous processors
using a novel token mechanism that supports game-theoretic notions of
fairness such as sharing incentives and envy-freeness. The mechanism frames
the allocation problem as a repeated game. In each round of the game, users
request big processors and spend a token if their request is granted. We
formulate game dynamics and optimize users' strategies to produce an
equilibrium. Allocations from optimal strategies balance performance and
fairness. Our token mechanism outperforms classical, fair mechanisms by
1.7x, on average, in total performance gains, and is competitive with a
performance maximizing mechanism.
Fourth, we present a processor allocation framework that uses Amdahl's Law to
model parallel performance and a market mechanism to allocate cores. We
propose the Amdahl utility function and demonstrate its accuracy when
modeling performance from processor core allocations. We then design a
market based on Amdahl utility and propose the Amdahl bidding procedure that
optimizes users' bids for processors based on workload parallelizability.
The framework uses entitlements to guarantee fairness yet outperforms
existing proportional share algorithms.
Finally, sharing computational resources amortizes cost and improves
utilization and efficiency. When agents pool their resources together,
each becomes entitled to a portion of the shared pool. Static
allocations in each round can guarantee entitlements and are
strategy-proof, but efficiency suffers because allocations do not
reflect variations in agents' demands for resources across rounds.
Dynamic allocation mechanisms assign resources to agents across
multiple rounds while guaranteeing agents their entitlements.
Designing dynamic mechanisms is challenging, however, when agents are
strategic and can benefit by misreporting their demands for resources.
The Amdahl bidding mechanism facilitates the trade in resources between
users with static demands within a single management round. To facilitate
the trade in resources between users with dynamic demands across multiple
rounds, we propose two novel mechanisms. First, the T-period mechanism
satisfies strategy-proofness and sharing incentives but with low efficiency.
Second, the token mechanism satisfies strategy-proofness and guarantees at
least a 50% approximation of sharing incentives, which means users receive
at least half the utility they would have received by not participating in
the mechanism. Through simulations on data gathered from Google clusters, we
show that the performance of the token mechanism is comparable to that of
state-of-the-art mechanisms that do not guarantee our game-theoretic
properties. Further, although the token mechanism only guarantees a 50%
approximation of sharing incentives, in practice, users receive at least
98\% of their sharing incentives guarantee.
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Rights for Collection: Duke Dissertations