Improving the Efficiency and Robustness of In-Memory Computing in Emerging Technologies
Date
2023
Authors
Advisors
Journal Title
Journal ISSN
Volume Title
Repository Usage Stats
views
downloads
Abstract
Emerging technologies, such as resistive random-access memory (ReRAM), have proven their potential in in-memory computing for deep learning applications. My dissertation work focuses on improving the efficiency and robustness of in-memory computing in emerging technologies.
Existing ReRAM-based processing-in-memory (PIM) designs can support the inferencing and the training of neural networks, such as convolutional neural networks and recurrent neural networks. However, these designs suffer from the re-writing procedure for the self-attention calculation. Therefore, I propose an architecture that enables the efficient self-attention mechanism in PIM design. The optimized calculation procedure and finer granularity pipeline design improve efficiency. The contributions lie in enabling feasible and efficient ReRAM-based PIM designs for attention-based models.
Inferencing with ReRAM-based design has one severe problem: the inferencing accuracy can be degraded due to the non-idealities in hardware devices. The robustness of the previous method is not validated under the combination of device stochastic noise. With the proposed hardware-aware training method, the robustness of inferencing accuracy can be improved. Besides, with hardware efficiency and inferencing robustness targets, the multi-objective optimization method is developed to explore the design space and generate high-quality Pareto-optimal design configurations with minimal cost. This work integrates attributes from the design space and the evaluation space and develops efficient hardware-software co-design methods.
Training with ReRAM-based design has one challenging endurance problem due to the frequent weight updates for neural network training. The expectation for endurance management is to decrease the number of weight updates and balance the write accesses. The proposed endurance-aware training method utilizes gradient structure pruning and dynamically structurally adjusts the write probabilities. This method can expand the life cycle for ReRAM during the training process.
In summary, the research above targets realizing efficient self-attention mechanisms and solving accuracy degradation and endurance problems for the inferencing and training processes. Besides, the efforts lie in figuring out the challenging parts of each topic and developing hardware-software co-design considering efficiency and robustness. The developed designs are the potential solutions for the challenging problems of in-memory computing in emerging technologies.
Type
Department
Description
Provenance
Citation
Permalink
Citation
Yang, Xiaoxuan (2023). Improving the Efficiency and Robustness of In-Memory Computing in Emerging Technologies. Dissertation, Duke University. Retrieved from https://hdl.handle.net/10161/29125.
Collections
Except where otherwise noted, student scholarship that was shared on DukeSpace after 2009 is made available to the public under a Creative Commons Attribution / Non-commercial / No derivatives (CC-BY-NC-ND) license. All rights in student work shared on DukeSpace before 2009 remain with the author and/or their designee, whose permission may be required for reuse.