In-Memory Computing Architecture for Deep Learning Acceleration

dc.contributor.advisor

Chen, Yiran

dc.contributor.advisor

Li, Hai

dc.contributor.author

Chen, Fan

dc.date.accessioned

2021-01-12T22:25:47Z

dc.date.available

2021-07-11T08:17:09Z

dc.date.issued

2020

dc.department

Electrical and Computer Engineering

dc.description.abstract

The ever-increasing demands of deep learning applications, especially the more powerful but intensive unsupervised deep learning models, overwhelm computation capability, communication capability, and storage capability of the modern general-purpose CPUs and GPUs. To accommodate the memory and computing requirement, multi-core systems that make intensive use of accelerators become the future of computing. Such novel computing systems incurs new challenges including architectural support for model training in the accelerators, large cache demands for multi-core processors, system performance, energy, and efficiency. In this thesis, I present my research works that address these challenges by leveraging emerging memory and logic devices, as well as advanced integration technologies. In the first work, I present the first training accelerator architecture, ReGAN, for unsupervised deep learning. ReGAN follows the process-in-memory strategy by leveraging energy efficiency of resistive memory arrays for in-situ deep learning execution. I proposed an efficient pipelined training procedure to reduce on-chip memory access. In the second work, I present ZARA to address the resource underutilization due to a new operator, namely, transposed convolution, used in unsupervised learning models. ZARA improves the system efficiency by a novel computation deformation technique. In the third work, I present MARVEL that targets to improve power efficiency in previous resistive accelerators. MARVEL leverage the monolithic 3D integration technology by stacking multi-layer of low-power analog/digital conversion circuits implemented with carbon nanotube field-effect transistors. The area-consuming eDRAM buffers are replaced by dense cross-point Spin Transfer Torque Magnetic RAM. I explored the design space and demonstrated that MARVEL can provide further improved power efficiency with increased number of integration layers. In the last piece of work, I propose the first holistic solution for employing skyrmions racetrack memory as last-level caches for future high-capacity cache design. I first present a cache architecture and a physical-to-logic mapping scheme based on comprehensive analysis on working mechanism of skyrmions racetrack memory. Then I model the impact of process variations and propose a process variation aware data management technique to minimize the performance degradation incurred by process variations.

dc.identifier.uri

https://hdl.handle.net/10161/22168

dc.subject

Computer engineering

dc.subject

Accelerators

dc.subject

Computer architecture

dc.subject

Deep learning

dc.subject

Emerging memory

dc.title

In-Memory Computing Architecture for Deep Learning Acceleration

dc.type

Dissertation

duke.embargo.months

5.884931506849314

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Chen_duke_0066D_15926.pdf
Size:
6.88 MB
Format:
Adobe Portable Document Format

Collections