In-Memory Computing Architecture for Deep Learning Acceleration
The ever-increasing demands of deep learning applications, especially the more powerful but intensive unsupervised deep learning models, overwhelm computation capability, communication capability, and storage capability of the modern general-purpose CPUs and GPUs. To accommodate the memory and computing requirement, multi-core systems that make intensive use of accelerators become the future of computing. Such novel computing systems incurs new challenges including architectural support for model training in the accelerators, large cache demands for multi-core processors, system performance, energy, and efficiency. In this thesis, I present my research works that address these challenges by leveraging emerging memory and logic devices, as well as advanced integration technologies. In the first work, I present the first training accelerator architecture, ReGAN, for unsupervised deep learning. ReGAN follows the process-in-memory strategy by leveraging energy efficiency of resistive memory arrays for in-situ deep learning execution. I proposed an efficient pipelined training procedure to reduce on-chip memory access. In the second work, I present ZARA to address the resource underutilization due to a new operator, namely, transposed convolution, used in unsupervised learning models. ZARA improves the system efficiency by a novel computation deformation technique. In the third work, I present MARVEL that targets to improve power efficiency in previous resistive accelerators. MARVEL leverage the monolithic 3D integration technology by stacking multi-layer of low-power analog/digital conversion circuits implemented with carbon nanotube field-effect transistors. The area-consuming eDRAM buffers are replaced by dense cross-point Spin Transfer Torque Magnetic RAM. I explored the design space and demonstrated that MARVEL can provide further improved power efficiency with increased number of integration layers. In the last piece of work, I propose the first holistic solution for employing skyrmions racetrack memory as last-level caches for future high-capacity cache design. I first present a cache architecture and a physical-to-logic mapping scheme based on comprehensive analysis on working mechanism of skyrmions racetrack memory. Then I model the impact of process variations and propose a process variation aware data management technique to minimize the performance degradation incurred by process variations.
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Rights for Collection: Duke Dissertations
Works are deposited here by their authors, and represent their research and opinions, not that of Duke University. Some materials and descriptions may include offensive content. More info