Design and Management Strategies for Hardware Accelerators

dc.contributor.advisor

Sorin, Daniel DS

dc.contributor.advisor

Lee, Benjamin BL

dc.contributor.author

Mehrabi, Atefeh

dc.date.accessioned

2022-06-15T18:44:27Z

dc.date.available

2022-06-15T18:44:27Z

dc.date.issued

2022

dc.department

Electrical and Computer Engineering

dc.description.abstract

Hardware acceleration, which entails customizing datapath and control logic for a specificapplication, has led to remarkable performance and power efficiency. Hardware accelerators have emerged in various forms including specialized processors (e.g., graphics processing unit [GPU]), fixed function custom circuits (e.g., application-specific integrated circuit [ASIC]), and reconfigurable fabrics (e.g., field-programmable gate array [FPGA]). Though accelerators provide excellent gains, the development of efficient ones is time-consuming, costly, and complex. Various techniques such as high-level synthesis (HLS) and FPGAs have been employed to reduce acceleration costs. However, hardware acceleration remains challenging due to the effort required to understand and optimize the design, as well as the limited system support available for efficient run-time management. Inefficient design mechanisms and a lack of proper run-time management policies for emerging platforms limit the pace at which accelerators can scale.

This dissertation investigates solutions to reduce the costs and complexities ofdeveloping and using hardware accelerators. We pursue these goals by applying intelligent design mechanisms and efficient run-time management policies. We present three pieces of research, each of which tackles a unique design or run-time challenge. First, designing hardware accelerators remains largely dependent on register-transfer level (RTL) design and writing hardware code (i.e., Verilog, VHDL), which is a tedious process. The growing demand for hardware accelerators has attracted more attention to faster solutions such as HLS. However, generating the optimal RTL with HLS requires tuning optimizations. This process can take days and is heavily reliant on designers. We build an automated framework called Prospector to further reduce the design effort with statistical learning. Prospector uses Bayesian optimization to tune optimizations intelligently and discover efficient designs.

Second, FPGAs have gained popularity due to their reconfigurability, which reducesthe high fabrication costs of custom circuits in ASIC. The deployment of increasingly large and capable FPGAs as reconfigurable accelerators has motivated researchers to identify mechanisms for sharing FPGAs to further reduce hardware costs. However, unlike the support provided for general-purpose processors, FPGA system support is not yet mature. Traditional scheduling policies used to manage other shared resources (e.g., multi-core CPUs) do not account for the unique characteristics of FPGAs, leading to infeasible or inefficient allocations of FPGA resources and poor system outcomes. We study the unique characteristics of emerging FPGAs and propose a novel scheduling policy called spatiotemporal FPGA scheduling (STFS). STFS provides a flexible resource-sharing method that delivers predictable long-term allocations without significantly degrading system efficiency.

Finally, hardware-software co-design is explicitly challenging for kernels with irregularand data-dependent computation patterns. Sparse operations, which are at the heart of applications in many domains, are among the most challenging to accelerate. When mapped to parallel hardware like GPU, the distribution of the non-zero elements and underlying hardware platform affect the execution efficiency. Tuning application kernels significantly influences how operations map to the underlying hardware accelerator. Given the diversity in workloads and architectures, there is no one-size-fits-all solution. We focus on accelerating the sparse matrix multi-vector multiplication (SpMM) performance on GPU architectures. We propose an intelligent mechanism for optimizing SpMM performance based on the sparsity pattern in the data. Our solution relies on developing a set of simple but effective data permutation policies that improve GPU resource utilization and/or memory access latency.

dc.identifier.uri

https://hdl.handle.net/10161/25282

dc.subject

Computer engineering

dc.title

Design and Management Strategies for Hardware Accelerators

dc.type

Dissertation

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Mehrabi_duke_0066D_16775.pdf
Size:
6.4 MB
Format:
Adobe Portable Document Format

Collections