Browsing by Subject "Deep Neural Networks"
Results Per Page
Sort Options
Item Open Access Joint Optimization of Algorithms, Hardware, and Systems for Efficient Deep Neural Networks(2024) Li, ShiyuDeep learning has enabled remarkable performance breakthroughs across various domains, including computer vision, natural language processing, and recommender systems. However, the typical deep neural network (DNN) models employed in these applications require millions of parameters and billions of operations, leading to substantial computational and memory requirements. While researchers have proposed compression methods, optimized frameworks, and specialized accelerators to improve efficiency, outstanding challenges persist, limiting the achievable gains.
A fundamental challenge lies in the inherent irregularity and sparsity of DNNs. Although these models exhibit significant sparsity, with a considerable fraction of weights and activations being zero or near-zero values, exploiting this sparsity efficiently on modern hardware is problematic due to the irregular distribution of non-zero elements. This irregularity leads to substantial overhead in indexing, gathering, and processing sparse data, resulting in poor utilization of computational and memory resources. Furthermore, recent research has identified a significant gap between the theoretical and practical improvements achieved by compression methods. Additionally, emerging DNN architectures with novel operators often nullify previous optimization efforts in software frameworks and hardware accelerators, necessitating continuous adaptation.
To address these critical challenges, this dissertation targets building a holistic approach that jointly optimizes algorithms, hardware architectures, and system designs to enable efficient deployment of DNNs in the presence of irregularity and sparsity. On the algorithm level, a novel hardware-friendly compression method based on matrix decomposition is proposed. The original convolutional kernels are decomposed into common basis kernels and a series of coefficients, with conventional pruning applied to the coefficients. This compressed DNN forms a hardware-friendly structure where the sparsity pattern is shared across input feature map pixels, alleviating sparse pattern processing costs.
On the hardware level, a novel sparse DNN accelerator is introduced to support the inference of the compressed DNN. Low-precision quantization is applied to sparse coefficients, and high-precision to basis kernels. By involving only low-precision coefficients in sparse processing, the hardware efficiently matches non-zero weights and activations using inverted butterfly networks. The shared basis kernels and sparse coefficients significantly reduce buffer size and bandwidth requirements, boosting performance and energy efficiency.
At the system level, a near-data processing framework is proposed to address the challenge of training large DNN-based recommendation models. This framework adopts computational storage devices and coherent system interconnects to partition the model into subtasks. Data-intensive embedding operations run on computational storage devices with customized memory hierarchies, while compute-intensive feature processing and aggregation operations are assigned to GPUs for maximum efficiency. This framework enables training large DNN-based recommendation models without expensive hardware investments.
Through joint optimization across algorithms, hardware architectures, and system designs, this research aims to overcome the limitations imposed by irregularity and sparsity, enabling efficient deployment of DNNs in a broad range of applications and resource-constrained environments. By addressing these critical issues, this work paves the way for fully harnessing the potential of deep learning technologies in practical settings.
Item Open Access Practical Solutions to Neural Architecture Search on Applied Machine Learning(2024) Zhang, TunhouThe advent of Artificial Intelligence (AI) propels the real world into a new era characterized by remarkable design innovations and groundbreaking design automation, primarily fueled by Deep Neural Networks (DNN). At the heart of this transformation is the progress in Automated Machine Learning (AutoML), notably Neural Architecture Search (NAS). NAS lays a robust foundation for developing algorithms capable of automating design processes to determine the optimal architecture for academic benchmarks. However, the real challenge emerges when adapting NAS for Applied Machine Learning (AML) scenarios: navigating the complex terrain of design space exploration and exploitation. This complexity arises due to the heterogeneity of data and architectures required by real-world AML problems, an aspect that traditional NAS approaches struggle to address fully.
To bridge this gap, our research emphasizes creating a flexible search space that reduces reliance on human-derived architectural assumptions. We introduce innovative techniques aimed at refining search algorithms to accommodate greater flexibility. By carefully examining and enhancing search spaces and methodologies, we empower NAS solutions to cater to practical AML problems. This enables the exploration of broader search spaces, better performance potential, and lower search process costs.
We start by challenging homogeneous search space design for multi-modality 3D representations, proposing ``PIDS'' to enable joint dimension and interaction search for 3D point cloud segmentation. We consider two axes on adapting point cloud operators toward multi-modality data with density, geometry, and order varieties, achieving significant mIOU improvement on segmentation benchmarks over the state-of-the-art 3D models.To implement our approach efficiently in recommendation systems, we develop ``NASRec'' to support heterogeneous building operators and propose practical solutions to improve the quality of NAS on Click-Through Rates (CTR) prediction. We propose an end-to-end full architecture search with minimal human priors. We provide practical solutions to tackle scalability and heterogeneity challenges in NAS, outperforming manually designed models and existing NAS models on various CTR benchmarks. Finally, we pioneer our effort on industry-scale CTR benchmarks and propose DistDNAS to optimize search and serving efficiency, producing smaller and better recommendation models on a large-scale CTR benchmark. Intuited by the discoveries in NAS, we additionally uncover the underlying theoretical foundations of residual learning on computer vision foundation research and envision the prospects of our research on Artificial Intelligence, including Large Language Models, Generative AI, and beyond.