Software-Hardware Co-design For Deep Learning Model Acceleration

Loading...
Thumbnail Image

Date

2023

Advisors

Journal Title

Journal ISSN

Volume Title

Repository Usage Stats

54
views
127
downloads

Abstract

Current deep neural network (DNN) models have shown beyond-human performance in multiple artificial intelligent tasks. However, state-of-the-art DNN models still exhibit great issues on efficiency that pose significant obstacles to their practical application in real-world scenarios. To further improve the performance, modern DNN models keep increasing their model sizes and numbers of operations. However, it becomes a great challenge to deploy modern DNNs on mobile and edge devices because of their limited memory, computation resources and battery energy. This dissertation seeks to address these challenges by advancing and integrating techniques from both the software-design and hardware-design for efficient DNN training and inference, with the ultimate goal of developing accurate and efficient DNN models.

My research primarily focuses on advancing model compression techniques such as pruning and quantization to push the boundaries of efficiency-accuracy tradeoff in DNN models. For pruning, I propose efficient structural sparsity (ESS), a learning framework that can learning efficient structure sparsity in DNN models. Additionally, I extend ESS to acoustic applications such as speech recognition and speaker identification, demonstrating its effectiveness in various contexts. For quantization, I propose Heterogeneously Compressed Ensemble (HCE), a novel straight-forward method that build an efficient ensemble with the pruned and quantized variants from a pretrained DNN model. The above efforts have resulted in DNN models that are more accurate and efficient than existing state-of-the-art model compression methods. For hardware design, I designed an end-to-end neural network enhanced radar signal processing system on FPGA. The FPGA implementation is carefully optimized to better tradeoff between performance and energy efficiency. Finally for software-hardware co-design, I propose a Hessian-aware NM (HANM) pruning, a novel searching method to find the optimal mixed N:M sparsity scheme for deep neural network. On hardware side, we design and simulate the corresponding hardware architecture that support the various N:M sparsity schemes. In this case, HANM demonstrates optimal performance in real-world inference scenarios.

This dissertation research aims to pave the way for achieving a tradeoff between accuracy, efficiency and power consumption in DNN models, ultimately leading to the development of DNN models that are both accurate and efficient.

Description

Provenance

Citation

Citation

Zhang, Jingchi (2023). Software-Hardware Co-design For Deep Learning Model Acceleration. Dissertation, Duke University. Retrieved from https://hdl.handle.net/10161/27729.

Collections


Except where otherwise noted, student scholarship that was shared on DukeSpace after 2009 is made available to the public under a Creative Commons Attribution / Non-commercial / No derivatives (CC-BY-NC-ND) license. All rights in student work shared on DukeSpace before 2009 remain with the author and/or their designee, whose permission may be required for reuse.