Improving the Efficiency and Robustness of In-Memory Computing in Emerging Technologies

Yang, Xiaoxuan

Improving the Efficiency and Robustness of In-Memory Computing in Emerging Technologies

View / Download10.06 MB

Date

2023

Authors

Yang, Xiaoxuan

Advisors

Li, Hai

Chen, Yiran

Repository Usage Stats

25
views

119
downloads

Abstract

Emerging technologies, such as resistive random-access memory (ReRAM), have proven their potential in in-memory computing for deep learning applications. My dissertation work focuses on improving the efficiency and robustness of in-memory computing in emerging technologies.

Existing ReRAM-based processing-in-memory (PIM) designs can support the inferencing and the training of neural networks, such as convolutional neural networks and recurrent neural networks. However, these designs suffer from the re-writing procedure for the self-attention calculation. Therefore, I propose an architecture that enables the efficient self-attention mechanism in PIM design. The optimized calculation procedure and finer granularity pipeline design improve efficiency. The contributions lie in enabling feasible and efficient ReRAM-based PIM designs for attention-based models.

Inferencing with ReRAM-based design has one severe problem: the inferencing accuracy can be degraded due to the non-idealities in hardware devices. The robustness of the previous method is not validated under the combination of device stochastic noise. With the proposed hardware-aware training method, the robustness of inferencing accuracy can be improved. Besides, with hardware efficiency and inferencing robustness targets, the multi-objective optimization method is developed to explore the design space and generate high-quality Pareto-optimal design configurations with minimal cost. This work integrates attributes from the design space and the evaluation space and develops efficient hardware-software co-design methods.

Training with ReRAM-based design has one challenging endurance problem due to the frequent weight updates for neural network training. The expectation for endurance management is to decrease the number of weight updates and balance the write accesses. The proposed endurance-aware training method utilizes gradient structure pruning and dynamically structurally adjusts the write probabilities. This method can expand the life cycle for ReRAM during the training process.

In summary, the research above targets realizing efficient self-attention mechanisms and solving accuracy degradation and endurance problems for the inferencing and training processes. Besides, the efforts lie in figuring out the challenging parts of each topic and developing hardware-software co-design considering efficiency and robustness. The developed designs are the potential solutions for the challenging problems of in-memory computing in emerging technologies.

Type

Dissertation

Department

Electrical and Computer Engineering

Subjects

Computer engineering, Efficiency, Emerging technology, Hardware-software co-design, In-memory computing, Resistive random-access memory (ReRAM), Robustness

Permalink

https://hdl.handle.net/10161/29125

Citation

Yang, Xiaoxuan (2023). Improving the Efficiency and Robustness of In-Memory Computing in Emerging Technologies. Dissertation, Duke University. Retrieved from https://hdl.handle.net/10161/29125.

Collections

Dissertations

Full item page

Except where otherwise noted, student scholarship that was shared on DukeSpace after 2009 is made available to the public under a Creative Commons Attribution / Non-commercial / No derivatives (CC-BY-NC-ND) license. All rights in student work shared on DukeSpace before 2009 remain with the author and/or their designee, whose permission may be required for reuse.

Improving the Efficiency and Robustness of In-Memory Computing in Emerging Technologies

Date

Authors

Advisors

Journal Title

Journal ISSN

Volume Title

Repository Usage Stats

Abstract

Type

Department

Description

Provenance

Subjects

Citation

Permalink

Citation

Collections