Browsing by Subject "VLSI"
- Results Per Page
- Sort Options
Item Open Access Hybrid Digital/Analog In-Memory Computing(2024) Zheng, QilinThe relentless advancement of deep learning applications, particularly the highly potent yet computationally intensive deep unsupervised learning models, is pushing the boundaries of what modern general-purpose CPUs and GPUs can handle in terms of computation, communication, and storage capacities. To meet these burgeoning memory and computational demands, computing systems based on in-memory computing, which extensively utilize accelerators, are emerging as the next frontier in computing technology. This thesis delves into my research efforts aimed at overcoming these obstacles to develop a processing-in-memory based computing system tailored for machine learning tasks, with a focus on employing a hybrid digital/analog design approach.
In the initial part of my work, I introduce a novel concept that leverages hybrid digital/analog in-memory computing to enhance the efficiency of depth-wise convolution applications. This approach not only optimizes computational efficiency but also paves the way for more energy-efficient machine learning operations.
Following this, I expand upon the initial concept by presenting a design methodology that applies hybrid digital/analog in-memory computing to the processing of sparse attention operators. This extension significantly improves mapping efficiency, making it a vital enhancement for the processing capabilities of deep learning models that rely heavily on attention mechanisms.
In my third piece of work, I detail the implementation strategies aimed at augmenting the power efficiency of in-memory computing macros. By integrating hybrid digital/analog computing concepts, this implementation focuses on general-purpose neural network acceleration, showcasing a significant step forward in reducing the energy consumption of such computational processes.
Lastly, I introduce a system-level simulation tool designed for simulating general-purpose in-memory-computing based systems. This tool facilitates versatile architecture exploration, allowing for the assessment and optimization of various configurations to meet the specific needs of machine learning workloads. Through these comprehensive research efforts, this thesis contributes to the advancement of in-memory computing technologies, offering novel solutions to the challenges posed by the next generation of machine learning applications.
Item Embargo Processing-in-Memory Accelerators Toward Energy-Efficient Real-World Machine Learning(2024) Kim, BokyungArtificial intelligence (AI) has permeated the real world, reaping unprecedented success. Numberless applications exploit machine learning (ML) technologies of big data and compute-intensive algorithms. Moreover, the aspiration of authentic machine intelligence moves computing toward the edge to handle complex tasks conventionally tailored for human beings. Along with the rapid development, the gap between the increasing resource requirements in ML and the restricted environments of edge engenders urgent attention to the challenges in efficiency. To resolve the gap, solutions across different disciplines in hardware are necessary beyond algorithm development.
Unfortunately, hardware development falls far behind because of heterogeneity. While the sensational advance of ML algorithms is a game-change of computing paradigms, conventional hardware unfits new paradigms due to fundamental limitations in its architecture and technology. The traditional architecture separating storage and computation is dreadfully inefficient for innumerable data processing and computing in algorithms, showing high power consumption and low performance. The realization of the fundamental limitations motivates efficient and non-conventional hardware accelerators.
As a new hardware paradigm, processing-in-memory accelerators (PIM) have brought significant expectations because of their intuitive effectiveness for the limitations of traditional hardware. PIM merges computing and processing units and saves resources for data and computations, pursuing non-heterogeneity and ultimately improving efficiency.Previous PIM accelerators have shown promising outcomes with high-performance computing, particularly thanks to emerging memories under the name of memristor.
Despite its motivation for non-heterogeneity, PIM-based designs couldn't fully escape from heterogeneity causing inefficiency with high costs. While emerging memories provide revolutions at device and circuit levels, PIM at higher levels struggles with various components in systems (horizontal heterogeneity). Furthermore, PIM is holistically designed across hierarchical levels of heterogeneity (vertical heterogeneity), which complicates its design with efficiency.Even robustness could be significantly influenced by heterogeneity.
Confronting the challenges in heterogeneity, efficiency, and robustness, my research has cultivated PIM hardware through cross-layer designs for practically efficient ML acceleration. Specifically, focusing on architecture/system-level innovations, I have pioneered novel 3D architectures and systemic paradigms, which provide a strong foundation for future computing. For ML acceleration, I have proposed new methodologies to efficiently operate 3D architecture and a novel dataflow with a new 3D design for energy efficiency by pursuing non-heterogeneity. The innovations have been examined through rigorous hardware experiments, and their practical efficiency has been proved with a fabricated chip for seizure classification, a real-world application. According to the need for future ML, my research is evolving to accomplish robustness in hardware as ML platforms. In this dissertation, I summarize the research impacts based on my diverse design experiences, spanning architecture and system design to chip fabrication.