Algorithms and Architectures for Resilient Hardware Design
Date
2024
Authors
Advisors
Journal Title
Journal ISSN
Volume Title
Abstract
The increasing complexity of hardware designs has introduced significant challenges in maintaining system resilience. The widespread use of third-party hardware components and IPs exacerbates this issue, increasing the likelihood of security problems. Although each IP has been verified by its vendor, integrating multiple IPs into a System-on-Chip (SoC) introduces complex interactions that are difficult to fully verify, which can lead to unforeseen issues post-deployment.
Hardware vulnerabilities, deeply embedded within the system, often remain undetected until they are exploited, posing serious risks to the security of the system. Unlike software, which can be easily updated and patched after deployment, resolving hardware bugs after production is far more difficult and costly. Hardware reliability is not limited to security concerns; faults also play a critical role. Therefore, building a reliable system requires addressing both security vulnerabilities and hardware faults.
While software and firmware bugs can be quickly fixed with updates, patching hardware vulnerabilities and mitigating faults after deployment is a complex, resource-intensive process. As security demands evolve, designing resilient hardware becomes increasingly important. This not only involves preventing potential vulnerabilities but also incorporating mechanisms to patch hardware bugs and faults at runtime. Designing hardware with built-in flexibility for in-field patching and fault management is challenging, particularly in balancing performance and resource constraints. The key question is: how much additional resource can be integrated into the hardware, and how effectively can the patching logic function, without imposing excessive overhead on the system?
To address these challenges, this dissertation focuses on the algorithms and architectures needed to build resilient hardware designs. It begins by introducing a hardware-based patching block, providing a systematic approach for system integrators to design a patching architecture for an SoC. This approach allocates resources across various IPs based on their importance, leading to a more resilient system.
The dissertation then addresses the lack of standardized metrics for evaluating patching hardware. It is the first to define the concept of theoretical ``patchability'', which is comprised of the observability and controllability of patching hardware. Patchability is quantified using probabilistic models combined with heuristic methods, and a tool is developed to demonstrate how patchability can be evaluated across different hardware configurations.
Building on this foundation, the dissertation advances the idea of patchability by refining it to consider the patching hardware architecture. A fully customizable patching block is proposed to meet the specific needs of each IP in an SoC. The patchability quantification is revised to accurately reflect the differences between patching blocks and scales with the parameters of the patching hardware. The dissertation explores various design options, comparing their ability to address hardware vulnerabilities and highlighting the effectiveness of the proposed practical patchability framework.
In addition to hardware-based approaches for enhancing system resilience, the dissertation also presents an algorithm for identifying security assets at the register-transfer level (RTL) during the early stages of design. This helps prevent vulnerabilities from remaining hidden after deployment. Structural analysis is performed on graphs generated from signal relationships and data/control flow at the RTL, and each asset type is classified based on unique structural patterns. The approach demonstrates that security assets can be identified with very high accuracy.
Furthermore, the dissertation incorporates fault-tolerance designs to complement patching security bugs. A checksum-based fault detection mechanism and a PE-level fault localization method are proposed to enhance the resilience of DNN accelerators at runtime. By leveraging the characteristics of matrix multiplication, these methods provide effective and resource-efficient fault detection and localization without relying on test patterns or precomputed values. The proposed hardware architecture enables these fault tolerance techniques with minimal overhead, achieving a 100\% fault detection and localization rate.
In summary, this dissertation addresses the challenges faced by system integrators in building resilient hardware designs. It fills gaps in current practices by providing both theoretical and practical formulations of patchability and demonstrates their usefulness. Additionally, this research offers both hardware- and software-based solutions to mitigate vulnerabilities and enhance system resilience. The proposed approaches give hardware designers multiple pathways to make their designs more robust.
Type
Department
Description
Provenance
Subjects
Citation
Permalink
Citation
Liu, Wei-Kai (2024). Algorithms and Architectures for Resilient Hardware Design. Dissertation, Duke University. Retrieved from https://hdl.handle.net/10161/32610.
Collections
Except where otherwise noted, student scholarship that was shared on DukeSpace after 2009 is made available to the public under a Creative Commons Attribution / Non-commercial / No derivatives (CC-BY-NC-ND) license. All rights in student work shared on DukeSpace before 2009 remain with the author and/or their designee, whose permission may be required for reuse.