Near Data Processing for Data-Intensive Machine Learning Workloads

Loading...

Date

2025

Journal Title

Journal ISSN

Volume Title

Repository Usage Stats

1
views
1
downloads

Abstract

The rapid growth of data-driven applications—spanning personalized recommendation, large-scale machine learning, federated learning, and vector databases—places unprecedented pressure on traditional computer architectures. Conventional systems struggle under the intense memory bandwidth demands and irregular data access patterns of modern workloads. Moreover, limited concurrency in memory hierarchies, high off-chip data movement costs, and the difficulty of scaling to massive, heterogeneous deployments all contribute to performance bottlenecks and energy inefficiencies. These challenges are further amplified by the need for real-time or near-real-time processing, as well as the emerging requirement for large-capacity and high-throughput approximate nearest neighbor search (ANNS) in retrieval-augmented generation for large language models (LLMs). Collectively, these trends expose the critical limitations of existing architectures, revealing a clear need for near-data processing (NDP) solutions that bring computation closer to where data resides.In this dissertation, Near Data Processing for Data-Intensive Machine Learning Workloads, I present a suite of hardware–software co-design techniques to address these architectural challenges and optimize performance and energy efficiency for data-intensive tasks. First, I introduce ReRec, a ReRAM-based accelerator specialized for sparse embedding lookups in recommendation models. By refining crossbar designs and incorporating an access-aware mapping algorithm, ReRec achieves up to a 29.26× throughput improvement over CPU baselines while mitigating latency and resource under-utilization in fine-grained operations. Next, to tackle the significant gap between compute performance and data transfer speeds, I propose ICGMM, a hardware-driven cache management framework based on Gaussian Mixture Models for Compute Express Link (CXL)-based memory expansion. Prototyped on an FPGA, ICGMM dramatically reduces cache miss rates and average access latency compared to traditional caching policies, with markedly lower hardware overhead. Building on these insights, I develop EMS-I, an efficient memory system that integrates SSDs via CXL for large-scale recommendation models, such as DLRMs. By tailoring caching and prefetching mechanisms to data access patterns, EMS-I reduces memory costs while delivering performance comparable to state-of-the-art NDP solutions—at substantially lower energy consumption. Beyond recommendation tasks, I address the scalability and heterogeneity issues in federated learning through FedRepre, a framework that accelerates global model convergence via a bi-level active client selection strategy. Enhanced by a specialized server architecture and unified CXL-based memory pool, FedRepre reduces training time by up to 19.54× while improving model accuracy under real-world FL constraints. I also extend the co-design philosophy to NDSearch, an ANNS solution critical for vector databases and retrieval-augmented generation in LLMs. By leveraging a near-data processing architecture within NAND flash, NDSearch exploits internal parallelism to achieve speedups exceeding 30× over CPU baselines, alongside orders-of-magnitude gains in energy efficiency. Collectively, these projects illustrate how holistic hardware–software co-design and near data processing strategies—encompassing ReRAM accelerators, in-storage computing, and CXL-based memory systems—can overcome the persistent challenges of bandwidth-intensive, latency-sensitive, and large-scale machine learning applications. This work provides a promising roadmap toward faster, more efficient, and more scalable computing systems in an era of ever-growing data demands.

Description

Provenance

Subjects

Computer engineering

Citation

Citation

Wang, Yitu (2025). Near Data Processing for Data-Intensive Machine Learning Workloads. Dissertation, Duke University. Retrieved from https://hdl.handle.net/10161/32710.


Except where otherwise noted, student scholarship that was shared on DukeSpace after 2009 is made available to the public under a Creative Commons Attribution / Non-commercial / No derivatives (CC-BY-NC-ND) license. All rights in student work shared on DukeSpace before 2009 remain with the author and/or their designee, whose permission may be required for reuse.