Show simple item record

Managing Shared Resources in the Data Center Era

dc.contributor.advisor Lee, Benjamin C. Zahedi, Seyed Majid 2018-05-31T21:13:47Z 2018-05-31T21:13:47Z 2018
dc.description.abstract <p>To improve efficiency and amortize cost over more computation, resource </p><p>sharing has become vital in high performance computing systems. In such </p><p>systems, the conventional wisdom assumes that users have to share, </p><p>regardless of the management policy. With a wide range of computing options </p><p>available, this assumption does not seem to hold for today's self-interested </p><p>users. These users selfishly pursue their individual performance without </p><p>regard for others or the system. And if they dislike management outcomes, </p><p>they will withdraw from the shared system. If they decide to share, they </p><p>will try to game the management system by misreporting their resource </p><p>demands to improve their performance, perhaps at the expense of others in </p><p>the system. To address this challenge and study strategic behavior of </p><p>self-interested users, game theory is known to be an effective tool. Drawing </p><p>on game theory, this thesis encourages new thinking in designing management </p><p>platforms robust to strategic behavior. In this thesis, we present five </p><p>pieces of work on data center management platforms. </p><p>First, with the democratization of cloud and datacenter computing, users </p><p>increasingly share large hardware platforms. In this setting, architects </p><p>encounter two challenges: sharing fairly and sharing multiple resources. </p><p>Drawing on game theory, we rethink fairness in computer </p><p>architecture. A fair allocation must provide sharing incentives (SI), </p><p>envy-freeness (EF), and Pareto efficiency (PE). We show that Cobb-Douglas </p><p>utility functions are well suited to modeling user preferences for cache </p><p>capacity and memory bandwidth. Additionally we present an allocation mechanism that </p><p>uses Cobb-Douglas preferences to determine each user's fair share of the </p><p>hardware. This mechanism provably guarantees SI, EF, and PE, as well as </p><p>strategy-proofness in the large (SPL). And it does so with modest </p><p>performance penalties, less than 10% throughput loss, relative to an unfair </p><p>mechanism.</p><p>Second, computational sprinting is a class of mechanisms that boost </p><p>performance but dissipate additional power. We describe a sprinting </p><p>architecture in which many independent chip multiprocessors share a power </p><p>supply and sprints are constrained by the chips' thermal limits and the </p><p>rack's power limits. Moreover, we present the computational sprinting game, </p><p>a multi-agent perspective on managing sprints. Strategic agents decide </p><p>whether to sprint based on application phases and system conditions. The </p><p>game produces an equilibrium that improves task throughput for data </p><p>analytics workloads by 4-6x over prior greedy heuristics and performs </p><p>within 90% of an upper bound on throughput from a globally optimized </p><p>policy.</p><p>Third, ensuring fairness in a system with scarce and commonly preferred resources </p><p>requires time sharing. We consider a heterogeneous system with a few </p><p>``big'' and many ``small'' processors. We allocate heterogeneous processors </p><p>using a novel token mechanism that supports game-theoretic notions of </p><p>fairness such as sharing incentives and envy-freeness. The mechanism frames </p><p>the allocation problem as a repeated game. In each round of the game, users </p><p>request big processors and spend a token if their request is granted. We </p><p>formulate game dynamics and optimize users' strategies to produce an </p><p>equilibrium. Allocations from optimal strategies balance performance and </p><p>fairness. Our token mechanism outperforms classical, fair mechanisms by </p><p>1.7x, on average, in total performance gains, and is competitive with a </p><p>performance maximizing mechanism.</p><p>Fourth, we present a processor allocation framework that uses Amdahl's Law to </p><p>model parallel performance and a market mechanism to allocate cores. We </p><p>propose the Amdahl utility function and demonstrate its accuracy when </p><p>modeling performance from processor core allocations. We then design a </p><p>market based on Amdahl utility and propose the Amdahl bidding procedure that </p><p>optimizes users' bids for processors based on workload parallelizability. </p><p>The framework uses entitlements to guarantee fairness yet outperforms </p><p>existing proportional share algorithms.</p><p>Finally, sharing computational resources amortizes cost and improves</p><p>utilization and efficiency. When agents pool their resources together,</p><p>each becomes entitled to a portion of the shared pool. Static</p><p>allocations in each round can guarantee entitlements and are</p><p>strategy-proof, but efficiency suffers because allocations do not</p><p>reflect variations in agents' demands for resources across rounds.</p><p>Dynamic allocation mechanisms assign resources to agents across</p><p>multiple rounds while guaranteeing agents their entitlements.</p><p>Designing dynamic mechanisms is challenging, however, when agents are</p><p>strategic and can benefit by misreporting their demands for resources.</p><p>The Amdahl bidding mechanism facilitates the trade in resources between </p><p>users with static demands within a single management round. To facilitate </p><p>the trade in resources between users with dynamic demands across multiple </p><p>rounds, we propose two novel mechanisms. First, the T-period mechanism </p><p>satisfies strategy-proofness and sharing incentives but with low efficiency. </p><p>Second, the token mechanism satisfies strategy-proofness and guarantees at </p><p>least a 50% approximation of sharing incentives, which means users receive </p><p>at least half the utility they would have received by not participating in </p><p>the mechanism. Through simulations on data gathered from Google clusters, we </p><p>show that the performance of the token mechanism is comparable to that of </p><p>state-of-the-art mechanisms that do not guarantee our game-theoretic </p><p>properties. Further, although the token mechanism only guarantees a 50% </p><p>approximation of sharing incentives, in practice, users receive at least </p><p>98\% of their sharing incentives guarantee.</p>
dc.subject Computer science
dc.title Managing Shared Resources in the Data Center Era
dc.type Dissertation
dc.department Computer Science

Files in this item


This item appears in the following Collection(s)

Show simple item record