Managing Shared Resources in the Data Center Era

dc.contributor.advisor

Lee, Benjamin C

dc.contributor.author

Zahedi, Seyed Majid

dc.date.accessioned

2018-05-31T21:13:47Z

dc.date.available

2018-05-31T21:13:47Z

dc.date.issued

2018

dc.department

Computer Science

dc.description.abstract

To improve efficiency and amortize cost over more computation, resource

sharing has become vital in high performance computing systems. In such

systems, the conventional wisdom assumes that users have to share,

regardless of the management policy. With a wide range of computing options

available, this assumption does not seem to hold for today's self-interested

users. These users selfishly pursue their individual performance without

regard for others or the system. And if they dislike management outcomes,

they will withdraw from the shared system. If they decide to share, they

will try to game the management system by misreporting their resource

demands to improve their performance, perhaps at the expense of others in

the system. To address this challenge and study strategic behavior of

self-interested users, game theory is known to be an effective tool. Drawing

on game theory, this thesis encourages new thinking in designing management

platforms robust to strategic behavior. In this thesis, we present five

pieces of work on data center management platforms.

First, with the democratization of cloud and datacenter computing, users

increasingly share large hardware platforms. In this setting, architects

encounter two challenges: sharing fairly and sharing multiple resources.

Drawing on game theory, we rethink fairness in computer

architecture. A fair allocation must provide sharing incentives (SI),

envy-freeness (EF), and Pareto efficiency (PE). We show that Cobb-Douglas

utility functions are well suited to modeling user preferences for cache

capacity and memory bandwidth. Additionally we present an allocation mechanism that

uses Cobb-Douglas preferences to determine each user's fair share of the

hardware. This mechanism provably guarantees SI, EF, and PE, as well as

strategy-proofness in the large (SPL). And it does so with modest

performance penalties, less than 10% throughput loss, relative to an unfair

mechanism.

Second, computational sprinting is a class of mechanisms that boost

performance but dissipate additional power. We describe a sprinting

architecture in which many independent chip multiprocessors share a power

supply and sprints are constrained by the chips' thermal limits and the

rack's power limits. Moreover, we present the computational sprinting game,

a multi-agent perspective on managing sprints. Strategic agents decide

whether to sprint based on application phases and system conditions. The

game produces an equilibrium that improves task throughput for data

analytics workloads by 4-6x over prior greedy heuristics and performs

within 90% of an upper bound on throughput from a globally optimized

policy.

Third, ensuring fairness in a system with scarce and commonly preferred resources

requires time sharing. We consider a heterogeneous system with a few

``big'' and many ``small'' processors. We allocate heterogeneous processors

using a novel token mechanism that supports game-theoretic notions of

fairness such as sharing incentives and envy-freeness. The mechanism frames

the allocation problem as a repeated game. In each round of the game, users

request big processors and spend a token if their request is granted. We

formulate game dynamics and optimize users' strategies to produce an

equilibrium. Allocations from optimal strategies balance performance and

fairness. Our token mechanism outperforms classical, fair mechanisms by

1.7x, on average, in total performance gains, and is competitive with a

performance maximizing mechanism.

Fourth, we present a processor allocation framework that uses Amdahl's Law to

model parallel performance and a market mechanism to allocate cores. We

propose the Amdahl utility function and demonstrate its accuracy when

modeling performance from processor core allocations. We then design a

market based on Amdahl utility and propose the Amdahl bidding procedure that

optimizes users' bids for processors based on workload parallelizability.

The framework uses entitlements to guarantee fairness yet outperforms

existing proportional share algorithms.

Finally, sharing computational resources amortizes cost and improves

utilization and efficiency. When agents pool their resources together,

each becomes entitled to a portion of the shared pool. Static

allocations in each round can guarantee entitlements and are

strategy-proof, but efficiency suffers because allocations do not

reflect variations in agents' demands for resources across rounds.

Dynamic allocation mechanisms assign resources to agents across

multiple rounds while guaranteeing agents their entitlements.

Designing dynamic mechanisms is challenging, however, when agents are

strategic and can benefit by misreporting their demands for resources.

The Amdahl bidding mechanism facilitates the trade in resources between

users with static demands within a single management round. To facilitate

the trade in resources between users with dynamic demands across multiple

rounds, we propose two novel mechanisms. First, the T-period mechanism

satisfies strategy-proofness and sharing incentives but with low efficiency.

Second, the token mechanism satisfies strategy-proofness and guarantees at

least a 50% approximation of sharing incentives, which means users receive

at least half the utility they would have received by not participating in

the mechanism. Through simulations on data gathered from Google clusters, we

show that the performance of the token mechanism is comparable to that of

state-of-the-art mechanisms that do not guarantee our game-theoretic

properties. Further, although the token mechanism only guarantees a 50%

approximation of sharing incentives, in practice, users receive at least

98\% of their sharing incentives guarantee.

dc.identifier.uri

https://hdl.handle.net/10161/16861

dc.subject

Computer science

dc.title

Managing Shared Resources in the Data Center Era

dc.type

Dissertation

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Zahedi_duke_0066D_14486.pdf
Size:
3.02 MB
Format:
Adobe Portable Document Format

Collections