Software-Hardware Co-design For Deep Learning Model Acceleration

Zhang, Jingchi

Software-Hardware Co-design For Deep Learning Model Acceleration

View / Download5.89 MB

Date

2023

Authors

Zhang, Jingchi

Advisors

Li, Hai

Repository Usage Stats

63
views

151
downloads

Abstract

Current deep neural network (DNN) models have shown beyond-human performance in multiple artificial intelligent tasks. However, state-of-the-art DNN models still exhibit great issues on efficiency that pose significant obstacles to their practical application in real-world scenarios. To further improve the performance, modern DNN models keep increasing their model sizes and numbers of operations. However, it becomes a great challenge to deploy modern DNNs on mobile and edge devices because of their limited memory, computation resources and battery energy. This dissertation seeks to address these challenges by advancing and integrating techniques from both the software-design and hardware-design for efficient DNN training and inference, with the ultimate goal of developing accurate and efficient DNN models.

My research primarily focuses on advancing model compression techniques such as pruning and quantization to push the boundaries of efficiency-accuracy tradeoff in DNN models. For pruning, I propose efficient structural sparsity (ESS), a learning framework that can learning efficient structure sparsity in DNN models. Additionally, I extend ESS to acoustic applications such as speech recognition and speaker identification, demonstrating its effectiveness in various contexts. For quantization, I propose Heterogeneously Compressed Ensemble (HCE), a novel straight-forward method that build an efficient ensemble with the pruned and quantized variants from a pretrained DNN model. The above efforts have resulted in DNN models that are more accurate and efficient than existing state-of-the-art model compression methods. For hardware design, I designed an end-to-end neural network enhanced radar signal processing system on FPGA. The FPGA implementation is carefully optimized to better tradeoff between performance and energy efficiency. Finally for software-hardware co-design, I propose a Hessian-aware NM (HANM) pruning, a novel searching method to find the optimal mixed N:M sparsity scheme for deep neural network. On hardware side, we design and simulate the corresponding hardware architecture that support the various N:M sparsity schemes. In this case, HANM demonstrates optimal performance in real-world inference scenarios.

This dissertation research aims to pave the way for achieving a tradeoff between accuracy, efficiency and power consumption in DNN models, ultimately leading to the development of DNN models that are both accurate and efficient.

Type

Dissertation

Department

Electrical and Computer Engineering

Subjects

Computer engineering, Computer science, Electrical engineering, Efficient inference, Machine learning, Optimization

Permalink

https://hdl.handle.net/10161/27729

Citation

Zhang, Jingchi (2023). Software-Hardware Co-design For Deep Learning Model Acceleration. Dissertation, Duke University. Retrieved from https://hdl.handle.net/10161/27729.

Collections

Dissertations

Full item page

Except where otherwise noted, student scholarship that was shared on DukeSpace after 2009 is made available to the public under a Creative Commons Attribution / Non-commercial / No derivatives (CC-BY-NC-ND) license. All rights in student work shared on DukeSpace before 2009 remain with the author and/or their designee, whose permission may be required for reuse.

Software-Hardware Co-design For Deep Learning Model Acceleration

Date

Authors

Advisors

Journal Title

Journal ISSN

Volume Title

Repository Usage Stats

Abstract

Type

Department

Description

Provenance

Subjects

Citation

Permalink

Citation

Collections