Design and Management Strategies for Hardware Accelerators

Mehrabi, Atefeh

Design and Management Strategies for Hardware Accelerators

View / Download6.4 MB

Date

2022

Authors

Mehrabi, Atefeh

Advisors

Sorin, Daniel

Lee, Benjamin BL

Repository Usage Stats

90
views

459
downloads

Abstract

Hardware acceleration, which entails customizing datapath and control logic for a specificapplication, has led to remarkable performance and power efficiency. Hardware accelerators have emerged in various forms including specialized processors (e.g., graphics processing unit [GPU]), fixed function custom circuits (e.g., application-specific integrated circuit [ASIC]), and reconfigurable fabrics (e.g., field-programmable gate array [FPGA]). Though accelerators provide excellent gains, the development of efficient ones is time-consuming, costly, and complex. Various techniques such as high-level synthesis (HLS) and FPGAs have been employed to reduce acceleration costs. However, hardware acceleration remains challenging due to the effort required to understand and optimize the design, as well as the limited system support available for efficient run-time management. Inefficient design mechanisms and a lack of proper run-time management policies for emerging platforms limit the pace at which accelerators can scale.

This dissertation investigates solutions to reduce the costs and complexities ofdeveloping and using hardware accelerators. We pursue these goals by applying intelligent design mechanisms and efficient run-time management policies. We present three pieces of research, each of which tackles a unique design or run-time challenge. First, designing hardware accelerators remains largely dependent on register-transfer level (RTL) design and writing hardware code (i.e., Verilog, VHDL), which is a tedious process. The growing demand for hardware accelerators has attracted more attention to faster solutions such as HLS. However, generating the optimal RTL with HLS requires tuning optimizations. This process can take days and is heavily reliant on designers. We build an automated framework called Prospector to further reduce the design effort with statistical learning. Prospector uses Bayesian optimization to tune optimizations intelligently and discover efficient designs.

Second, FPGAs have gained popularity due to their reconfigurability, which reducesthe high fabrication costs of custom circuits in ASIC. The deployment of increasingly large and capable FPGAs as reconfigurable accelerators has motivated researchers to identify mechanisms for sharing FPGAs to further reduce hardware costs. However, unlike the support provided for general-purpose processors, FPGA system support is not yet mature. Traditional scheduling policies used to manage other shared resources (e.g., multi-core CPUs) do not account for the unique characteristics of FPGAs, leading to infeasible or inefficient allocations of FPGA resources and poor system outcomes. We study the unique characteristics of emerging FPGAs and propose a novel scheduling policy called spatiotemporal FPGA scheduling (STFS). STFS provides a flexible resource-sharing method that delivers predictable long-term allocations without significantly degrading system efficiency.

Finally, hardware-software co-design is explicitly challenging for kernels with irregularand data-dependent computation patterns. Sparse operations, which are at the heart of applications in many domains, are among the most challenging to accelerate. When mapped to parallel hardware like GPU, the distribution of the non-zero elements and underlying hardware platform affect the execution efficiency. Tuning application kernels significantly influences how operations map to the underlying hardware accelerator. Given the diversity in workloads and architectures, there is no one-size-fits-all solution. We focus on accelerating the sparse matrix multi-vector multiplication (SpMM) performance on GPU architectures. We propose an intelligent mechanism for optimizing SpMM performance based on the sparsity pattern in the data. Our solution relies on developing a set of simple but effective data permutation policies that improve GPU resource utilization and/or memory access latency.

Type

Dissertation

Department

Electrical and Computer Engineering

Subjects

Computer engineering

Permalink

https://hdl.handle.net/10161/25282

Citation

Mehrabi, Atefeh (2022). Design and Management Strategies for Hardware Accelerators. Dissertation, Duke University. Retrieved from https://hdl.handle.net/10161/25282.

Collections

Dissertations

Full item page

Except where otherwise noted, student scholarship that was shared on DukeSpace after 2009 is made available to the public under a Creative Commons Attribution / Non-commercial / No derivatives (CC-BY-NC-ND) license. All rights in student work shared on DukeSpace before 2009 remain with the author and/or their designee, whose permission may be required for reuse.

Design and Management Strategies for Hardware Accelerators

Date

Authors

Advisors

Journal Title

Journal ISSN

Volume Title

Repository Usage Stats

Abstract

Type

Department

Description

Provenance

Subjects

Citation

Permalink

Citation

Collections