Design and Management Strategies for Hardware Accelerators

Mehrabi, Atefeh

Design and Management Strategies for Hardware Accelerators

dc.contributor.advisor	Sorin, Daniel
dc.contributor.advisor	Lee, Benjamin BL
dc.contributor.author	Mehrabi, Atefeh
dc.date.accessioned	2022-06-15T18:44:27Z
dc.date.available	2022-06-15T18:44:27Z
dc.date.issued	2022
dc.department	Electrical and Computer Engineering
dc.description.abstract	Hardware acceleration, which entails customizing datapath and control logic for a specificapplication, has led to remarkable performance and power efficiency. Hardware accelerators have emerged in various forms including specialized processors (e.g., graphics processing unit [GPU]), fixed function custom circuits (e.g., application-specific integrated circuit [ASIC]), and reconfigurable fabrics (e.g., field-programmable gate array [FPGA]). Though accelerators provide excellent gains, the development of efficient ones is time-consuming, costly, and complex. Various techniques such as high-level synthesis (HLS) and FPGAs have been employed to reduce acceleration costs. However, hardware acceleration remains challenging due to the effort required to understand and optimize the design, as well as the limited system support available for efficient run-time management. Inefficient design mechanisms and a lack of proper run-time management policies for emerging platforms limit the pace at which accelerators can scale. This dissertation investigates solutions to reduce the costs and complexities ofdeveloping and using hardware accelerators. We pursue these goals by applying intelligent design mechanisms and efficient run-time management policies. We present three pieces of research, each of which tackles a unique design or run-time challenge. First, designing hardware accelerators remains largely dependent on register-transfer level (RTL) design and writing hardware code (i.e., Verilog, VHDL), which is a tedious process. The growing demand for hardware accelerators has attracted more attention to faster solutions such as HLS. However, generating the optimal RTL with HLS requires tuning optimizations. This process can take days and is heavily reliant on designers. We build an automated framework called Prospector to further reduce the design effort with statistical learning. Prospector uses Bayesian optimization to tune optimizations intelligently and discover efficient designs. Second, FPGAs have gained popularity due to their reconfigurability, which reducesthe high fabrication costs of custom circuits in ASIC. The deployment of increasingly large and capable FPGAs as reconfigurable accelerators has motivated researchers to identify mechanisms for sharing FPGAs to further reduce hardware costs. However, unlike the support provided for general-purpose processors, FPGA system support is not yet mature. Traditional scheduling policies used to manage other shared resources (e.g., multi-core CPUs) do not account for the unique characteristics of FPGAs, leading to infeasible or inefficient allocations of FPGA resources and poor system outcomes. We study the unique characteristics of emerging FPGAs and propose a novel scheduling policy called spatiotemporal FPGA scheduling (STFS). STFS provides a flexible resource-sharing method that delivers predictable long-term allocations without significantly degrading system efficiency. Finally, hardware-software co-design is explicitly challenging for kernels with irregularand data-dependent computation patterns. Sparse operations, which are at the heart of applications in many domains, are among the most challenging to accelerate. When mapped to parallel hardware like GPU, the distribution of the non-zero elements and underlying hardware platform affect the execution efficiency. Tuning application kernels significantly influences how operations map to the underlying hardware accelerator. Given the diversity in workloads and architectures, there is no one-size-fits-all solution. We focus on accelerating the sparse matrix multi-vector multiplication (SpMM) performance on GPU architectures. We propose an intelligent mechanism for optimizing SpMM performance based on the sparsity pattern in the data. Our solution relies on developing a set of simple but effective data permutation policies that improve GPU resource utilization and/or memory access latency.
dc.identifier.uri	https://hdl.handle.net/10161/25282
dc.subject	Computer engineering
dc.title	Design and Management Strategies for Hardware Accelerators
dc.type	Dissertation

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Mehrabi_duke_0066D_16775.pdf
Size:: 6.4 MB
Format:: Adobe Portable Document Format

Download

Collections

Dissertations