Browsing by Author "Li, Hai"
Results Per Page
Sort Options
Item Open Access Accelerator Architectures for Deep Learning and Graph Processing(2020) Song, LinghaoDeep learning and graph processing are two big-data applications and they are widely applied in many domains. The training of deep learning is essential for inference and has not yet been fully studied. With data forward, error backward, and gradient calculation, deep learning training is a more complicated process with higher computation and communication intensity. Distributing computations on multiple heterogeneous accelerators to achieve high throughput and balanced execution, however, remaining challenging. In this dissertation, I present AccPar, a principled and systematic method of determining the tensor partition for multiple heterogeneous accelerators for efficient training acceleration. Emerging resistive random access memory (ReRAM) is promising for processing in memory (PIM). For high-throughput training acceleration in ReRAM-based PIM accelerator, I present PipeLayer, an architecture for layer-wise pipelined parallelism. Graph processing is well-known for poor locality and high memory bandwidth demand. In conventional architectures, graph processing incurs a significant amount of data movements and energy consumption. I present GraphR, the first ReRAM-based graph processing accelerator which follows the principle of near-data processing and explores the opportunity of performing massive parallel analog operations with low hardware and energy cost. Sparse matrix-vector multiplication (SpMV), a subset of graph processing, is the key computation in iterative solvers for scientific computing. The efficiently accelerating floating-point processing in ReRAM remains a challenge. In this dissertation, I present ReFloat, a data format, and a supporting accelerator architecture, for low-cost floating-point processing in ReRAM for scientific computing.
Item Open Access Algorithm-hardware co-optimization for neural network efficiency improvement(2020) Yang, QingDeep neural networks (DNNs) are tremendously applied in the artificial intelligence field. While the performance of DNNs is continuously improved by more complicated and deeper structures, the feasibility of deployment on edge devices remains a critical problem. In this thesis, we present algorithm-hardware co-optimization approaches to address the challenges of efficient DNN deployments from three aspects: 1) save computational cost, 2) save memory cost, and 3) save data movements.
First, we present a joint regularization technique to advance the compression beyond the weights to neuron activations. By distinguishing and leveraging the significant difference among neuron responses and connections during learning, the jointly pruned network, namely JPnet, optimizes the sparsity of activations and weights. Second, to structurally regulate the dynamic activation sparsity (DAS), we propose a generic low-cost approach based on winners-take-all (WTA) dropout technique. The network enhanced by the proposed WTA dropout, namely DASNet, features structured activation sparsity with an improved sparsity level, which can be easily utilized to achieve acceleration on conventional embedded systems. The effectiveness of JPNet and DASNet has been thoroughly evaluated through various network models with different activation functions and on different datasets. Third, we propose BitSystolic, a neural processing unit based on a systolic array structure, to fully support the mixed-precision inference. In BitSystolic, the numerical precision of both weights and activations can be configured in the range of 2b~8b, fulfilling different requirements across mixed-precision models and tasks. Moreover, the design can support various data flows presented in different types of neural layers and adaptively optimize the data reuse by switching between the matrix-matrix mode and vector-matrix mode. We designed and fabricated the proposed BitSystolic in the 65nm process. Our measurement results show that BitSystolic features the unified power efficiency of up to 26.7 TOPS/W with 17.8 mW peak power consumption across various layer types. In the end, we will have a glance at computing-in-memory architectures based on resistive random-access memory (ReRAM) which realizes in-place storage and computation. A quantized training method is proposed to enhance the accuracy of neuromorphic systems based on ReRAM by alleviating the impact of limited parameter precision.
Item Open Access Efficient and Scalable Deep Learning(2019) Wen, WeiDeep Neural Networks (DNNs) can achieve accuracy superior to traditional machine learning models, because of their large learning capacity and the availability of large amounts of labeled data. In general, larger DNNs can obtain higher accuracy. However, there are two obstacles which hinder us building larger DNNs: (1) inference of large DNNs is slow which limits their deployment to small devices; (2) training large DNNs is also slow which slows down research exploration. To remove those obstacles, this dissertation focuses on acceleration of DNN inference and training. To accelerate DNN inference, original DNNs are compressed while keeping original accuracy. More specific, Structurally Sparse Deep Neural Networks (SSDNNs) are proposed to remove neural components. In Convolutional Neural Networks (CNNs), neurons, filters, channels and layers can be removed; in Recurrent Neural Networks (RNNs), hidden sizes can be reduced. The study shows that SSDNNs can achieve higher speedup than sparse DNNs which have non-structured sparsity. Besides SSDNNs, a Force Regularization is proposed to enforce DNNs to lower-rank space, such that DNNs can be decomposed to lower-rank architectures with fewer ranks than traditional methods. The dissertation also demonstrates that SSDNNs and Force Regularization are orthogonal and can be combined for higher speedup. To accelerate DNN training, distributed deep learning is required. However, two problems hinder us using more compute nodes for higher training speed: Communication Bottleneck and Generalization Gap. Communication Bottleneck is that communication time will increase and dominate when the distributed systems scale to many compute nodes. To reduce gradient communication in Stochastic Gradient Descent (SGD), SGD with low-precision gradients (TernGrad) is proposed. Moreover, in distributed deep learning, a large batch size is required to exploit system computing power; unfortunately, accuracy will decrease when the batch size is very large, which is referred to as the Generalization Gap. One hypothesis to explain Generalization Gap is that large-batch SGD sticks at sharp minima. The dissertation proposes a stochastic smoothing (SmoothOut) to escape sharp minima. The dissertation will show that TernGrad overcomes Communication Bottleneck and SmoothOut helps to close the Generalization Gap.
Item Open Access Efficient Deep Learning for Image Applications(2020) Wu, ChunpengBreakthrough of deep learning (DL) has greatly promoted development of machine learning in numerous academic disciplines and industries in recent years.
A subsequent concern, which is frequently raised by multidisciplinary researchers, software developers, and machine learning end users, is inefficiency of DL methods: intolerable training and inference time, exhausted computing resources, and unsustainable power consumption.
To tackle the inefficiency issues, tons of DL efficiency methods have been proposed to improve efficiency without sacrificing prediction accuracy of a specified application such as image classification and visual object detection.
However, we suppose that the traditional DL efficiency methods are not sufficiently flexible or adaptive to meet requirement of practical usage scenarios, based on two observations.
First, most of the traditional methods adopt an objective "no accuracy loss for a specified application", while the objective cannot cover considerable scenarios.
For example, to meet diverse user needs, a public cloud platform should provide an efficient and multipurpose DL method instead of focusing on an application only.
Second, most of the traditional methods adopt model compression and quantization as efficiency enhancement strategies, while these two strategies are severely degraded for a certain number of scenarios.
For example, for embedded deep neural networks (DNNs), significant architecture change and quantization may severely weaken customized hardware accelerators designed for predefined DNN operators and precision.
In this dissertation, we will investigate three popular usage scenarios and correspondingly propose our DL efficiency methods: versatile model efficiency, robust model efficiency, and processing-step efficiency.
The first scenario is requiring a DL method to achieve model efficiency and versatility.
The model efficiency is to design a compact deep neural network, while the versatility is to get satisfactory prediction accuracy on multiple applications.
We propose a compact DNN by integrating shape information into a newly designed module Conv-M, to tackle an issue that previous compact DNNs cannot achieve matched level of accuracy on image classification and unsupervised domain adaptation.
Our method can benefit software developers, since they can directly replace an original single-purpose DNN with our versatile one in their programs.
The second scenario is requiring a DL method to achieve model efficiency and robustness.
The robustness is to get satisfactory prediction accuracy for certain categories of samples.
These samples are critical but often wrongly predicted by previous methods.
We propose a fast training method based on simultaneous adaptive filter reuse (dynamic compression) and neuron-level robustness enhancement, to improve accuracy on self-driving motion prediction, especially the accuracy on night driving samples.
Our method can benefit algorithm researchers who are proficient in mathematically exploring loss functions but not skilled in empirically constructing efficient sub-modules of DNNs, since our dynamic compression does not require expertise on the sub-modules of DNNs.
The third scenario is requiring inference speed of a DL method to be fast without significantly changing DNN architecture and adopting quantization.
We propose a fast photorealistic style transfer method by removing time-consuming smoothing step during inference and introducing spatially coherent content-style preserving loss during training.
For computer vision engineers who struggle to combine DL efficiency approaches, our method provides a different candidate efficiency method compared to popular architecture tailoring and quantization.
Item Open Access Efficient Neural Network Based Systems on Mobile and Cloud Platforms(2020) Mao, JiachenIn recent years, machine learning, especially neural networks arouses unprecedented influence in both academia and industry.
The reason lies in the state-of-the-art performance of neural networks on many critical applications such as object detection, translation, and games. However, the deployment of neural network models on resource-constrained devices (e.g. edge devices) is challenged by their heavy memory and computing cost during execution. Many efforts have been done in previous literature for efficient execution of neural networks, including the perspectives of hardware, software, and algorithm.
My research focus during my Ph.D. study is mainly on software, and algorithm targeting at mobile platforms. More specifically, we emphasize the system design, system optimization, and model compression of neural networks for better mobile user experience. From the system design perspective, we first propose MoDNN – a local distributed mobile computing system for DNN testing. MoDNN can partition already trained DNN models onto several mobile devices to accelerate DNN computations by alleviating device-level computing cost and memory usage. Two model partition schemes are also designed to minimize non-parallel data delivery time, including both wakeup time and transmission time. Then, we propose AdaLearner – an adaptive local distributed mobile computing system for DNN training. To exploit the potential of our system, we adapt the neural networks training phase to mobile device-wise resources and fiercely decrease the transmission overhead for better system scalability. From the system optimization perspective, we propose MobiEye, a cloud-based video detection system optimized for deployment in real-time mobile applications. MobiEye is based on a state-of-the-art video detection framework called Deep Feature Flow (DFF). MobiEye optimizes DFF by three system-level optimization methods. From the model compression perspective, we propose Tprune, a model analyzing and pruning framework for Transformer. In TPrune, we first proposed Block-wise Structured Sparsity Learning (BSSL) to analyze Transformer model property. Then, based on the characters derived from BSSL, we apply Structured Hoyer Square (SHS) to derive the final compressed models. The realization of the projects during my PhD study could contribute to the current research on efficient neural network execution and thus result in more user-friendly and smart applications on edge devices for more users.
Item Open Access Highly Efficient Neuromorphic Computing Systems With Emerging Nonvolatile Memories(2020) Yan, BonanEmerging nonvolatile memory based hardware neuromorphic computing systems have enabled the implementation of general vector-matrix multiplication in a manner to fuse computation and memory at the same physical location. However, there remain three major challenges in designing such neuromorphic computing systems for high efficiency in a large scale integration: (a) the analog/digital interface circuits dominate the power and area in such mixed-signal designs; (b) they are highly customized and can only compute a class of neural network models once developed; (c) non-ideal device properties largely forfeit the benefit in terms of computational efficiency.
Designs of mixed-signal interface circuitry have been extensively studied, but a holistic design approach regarding very-large-scale integration is overlooked for emerging nonvolatile memory based neuromorphic computing systems involving circuit design, microarchitecture and hardware/software co-simulation. The realization of such neuromorphic computing platforms requires: (a) efficient interface circuits as well as execution models; (b) appropriate reconfigurability at runtime for different neural network architectures; and (c) reliability enhancement methods to resist imperfect fabrication and tough working environment.
Motivated by these demands, this dissertation first introduces an implementation scheme of neuromorphic computing system that uses emerging nonvolatile memory as synapses and CMOS integrated circuits as neurons. To save the energy consumption of data communication, the neuron circuits are improved upon conventional integrated and first neuron circuits for better current-to-spike conversion efficiency. Trade-offs between throughput and latency are investigated and validated by a prototype 64Kb Resistive Random Access Memory based in-memory computing processing engine.
Next, this dissertation proposes a type of fully-memristive neuromorphic computing system architecture that incorporates Mott memristor as the neuron circuit. The small footprint and intrinsic bionic dynamics of emerging memory-based neuron circuits significantly reduce design complexity. This dissertation investigates and models the randomness that Mott memristors inflict. By suppressing it during inference and exploiting it during learning, the proposed system is optimized for the balance of inference accuracy and training efficiency.
Moreover, this dissertation advances the reconfigurability of emerging memory based neuromorphic computing systems by presenting a paradigm that supports post-fabrication switching between spiking and non-spiking neural network model execution. An improved version of time-to-first-spike temporal encoding is proposed to use single spikes in accelerating the execution speed.
Finally, this dissertation presents hardware/software codesign techniques for the implementation of neuromorphic computing systems with emerging nonvolatile memories. A hardware/software co-simulation flow is developed. And based on this, this dissertation also proposes a closed-loop design to enhance the weight stability to resist the read disturbance.
In summary, the dissertation tackles important problems in designing neuromorphic computing systems with emerging nonvolatile memories. The outcome of this research is expected not only to pave the way for realizing highly efficiency artificial intelligence hardware, but also shorten the product development cycle.
Item Open Access Hybrid Digital/Analog In-Memory Computing(2024) Zheng, QilinThe relentless advancement of deep learning applications, particularly the highly potent yet computationally intensive deep unsupervised learning models, is pushing the boundaries of what modern general-purpose CPUs and GPUs can handle in terms of computation, communication, and storage capacities. To meet these burgeoning memory and computational demands, computing systems based on in-memory computing, which extensively utilize accelerators, are emerging as the next frontier in computing technology. This thesis delves into my research efforts aimed at overcoming these obstacles to develop a processing-in-memory based computing system tailored for machine learning tasks, with a focus on employing a hybrid digital/analog design approach.
In the initial part of my work, I introduce a novel concept that leverages hybrid digital/analog in-memory computing to enhance the efficiency of depth-wise convolution applications. This approach not only optimizes computational efficiency but also paves the way for more energy-efficient machine learning operations.
Following this, I expand upon the initial concept by presenting a design methodology that applies hybrid digital/analog in-memory computing to the processing of sparse attention operators. This extension significantly improves mapping efficiency, making it a vital enhancement for the processing capabilities of deep learning models that rely heavily on attention mechanisms.
In my third piece of work, I detail the implementation strategies aimed at augmenting the power efficiency of in-memory computing macros. By integrating hybrid digital/analog computing concepts, this implementation focuses on general-purpose neural network acceleration, showcasing a significant step forward in reducing the energy consumption of such computational processes.
Lastly, I introduce a system-level simulation tool designed for simulating general-purpose in-memory-computing based systems. This tool facilitates versatile architecture exploration, allowing for the assessment and optimization of various configurations to meet the specific needs of machine learning workloads. Through these comprehensive research efforts, this thesis contributes to the advancement of in-memory computing technologies, offering novel solutions to the challenges posed by the next generation of machine learning applications.
Item Open Access Improving the Efficiency and Robustness of In-Memory Computing in Emerging Technologies(2023) Yang, XiaoxuanEmerging technologies, such as resistive random-access memory (ReRAM), have proven their potential in in-memory computing for deep learning applications. My dissertation work focuses on improving the efficiency and robustness of in-memory computing in emerging technologies.
Existing ReRAM-based processing-in-memory (PIM) designs can support the inferencing and the training of neural networks, such as convolutional neural networks and recurrent neural networks. However, these designs suffer from the re-writing procedure for the self-attention calculation. Therefore, I propose an architecture that enables the efficient self-attention mechanism in PIM design. The optimized calculation procedure and finer granularity pipeline design improve efficiency. The contributions lie in enabling feasible and efficient ReRAM-based PIM designs for attention-based models.
Inferencing with ReRAM-based design has one severe problem: the inferencing accuracy can be degraded due to the non-idealities in hardware devices. The robustness of the previous method is not validated under the combination of device stochastic noise. With the proposed hardware-aware training method, the robustness of inferencing accuracy can be improved. Besides, with hardware efficiency and inferencing robustness targets, the multi-objective optimization method is developed to explore the design space and generate high-quality Pareto-optimal design configurations with minimal cost. This work integrates attributes from the design space and the evaluation space and develops efficient hardware-software co-design methods.
Training with ReRAM-based design has one challenging endurance problem due to the frequent weight updates for neural network training. The expectation for endurance management is to decrease the number of weight updates and balance the write accesses. The proposed endurance-aware training method utilizes gradient structure pruning and dynamically structurally adjusts the write probabilities. This method can expand the life cycle for ReRAM during the training process.
In summary, the research above targets realizing efficient self-attention mechanisms and solving accuracy degradation and endurance problems for the inferencing and training processes. Besides, the efforts lie in figuring out the challenging parts of each topic and developing hardware-software co-design considering efficiency and robustness. The developed designs are the potential solutions for the challenging problems of in-memory computing in emerging technologies.
Item Open Access In-Memory Computing Architecture for Deep Learning Acceleration(2020) Chen, FanThe ever-increasing demands of deep learning applications, especially the more powerful but intensive unsupervised deep learning models, overwhelm computation capability, communication capability, and storage capability of the modern general-purpose CPUs and GPUs. To accommodate the memory and computing requirement, multi-core systems that make intensive use of accelerators become the future of computing. Such novel computing systems incurs new challenges including architectural support for model training in the accelerators, large cache demands for multi-core processors, system performance, energy, and efficiency. In this thesis, I present my research works that address these challenges by leveraging emerging memory and logic devices, as well as advanced integration technologies. In the first work, I present the first training accelerator architecture, ReGAN, for unsupervised deep learning. ReGAN follows the process-in-memory strategy by leveraging energy efficiency of resistive memory arrays for in-situ deep learning execution. I proposed an efficient pipelined training procedure to reduce on-chip memory access. In the second work, I present ZARA to address the resource underutilization due to a new operator, namely, transposed convolution, used in unsupervised learning models. ZARA improves the system efficiency by a novel computation deformation technique. In the third work, I present MARVEL that targets to improve power efficiency in previous resistive accelerators. MARVEL leverage the monolithic 3D integration technology by stacking multi-layer of low-power analog/digital conversion circuits implemented with carbon nanotube field-effect transistors. The area-consuming eDRAM buffers are replaced by dense cross-point Spin Transfer Torque Magnetic RAM. I explored the design space and demonstrated that MARVEL can provide further improved power efficiency with increased number of integration layers. In the last piece of work, I propose the first holistic solution for employing skyrmions racetrack memory as last-level caches for future high-capacity cache design. I first present a cache architecture and a physical-to-logic mapping scheme based on comprehensive analysis on working mechanism of skyrmions racetrack memory. Then I model the impact of process variations and propose a process variation aware data management technique to minimize the performance degradation incurred by process variations.
Item Open Access Robustness Analysis and Improvement in Neural Networks and Neuromorphic Computing(2021) Song, ChangDeep learning and neural networks have great potential while still at risk. The so-called adversarial attacks, which apply small perturbations on input samples to fool models, threaten the reliability of neural networks and their hardware counterparts, neuromorphic computing. To solve such issues, various attempts are made, including adversarial training and other data augmentation methods.In our early attempt to defend adversarial attacks, we propose a multi-strength adversarial training method to cover a wider effective range than typical single-strength adversarial training. Furthermore, we also propose two different structures in order to compensate for the tradeoff between the total training time and the hardware implementation cost. Experimental results show that our proposed method gives better accuracy than the baselines with tolerable additional hardware cost. To better understand robustness, we analyze the adversarial problem in the decision space. In one of our defense approaches called feedback learning, we theoretically prove the effectiveness of adversarial training and other data augmentation method. For empirical proof, we generate non-adversarial examples based on the information of the decision boundaries of neural networks and add these examples in training. The results show that the boundaries of the models are more robust to noises and perturbations after applying feedback learning than baselines. Besides algorithm-level concerns, we also focus on hardware implementations in quantization scenarios. We find that adversarially-trained neural networks are more vulnerable to quantization loss than plain models. To improve the robustness of hardware-based quantized models, we explore methods such as feedback learning, nonlinear mapping, and layer-wise quantization. Results show that the adversarial and quantization robustness can be improved by feedback learning and nonlinear mapping, respectively. But the accuracy gap introduced by quantization can be further minimized. To minimize both losses simultaneously, we also propose a layer-wise adversarial-aware quantization method to choose the best quantization parameter settings for adversarially-trained models. In this method, we use the Lipschitz constant of different layers as error sensitivity metrics and design several criteria to decide the quantization settings for each layer. The results show that our method can further minimize the accuracy gap between full-precision and quantized adversarially-trained models.
Item Open Access Security and Robustness in Neuromorphic Computing and Deep Learning(2020) Yang, ChaofeiMachine learning (ML) has been promoting fast in the recent decade. Among many ML algorithms, inspired by biological neural systems, neural networks (NNs) and neuromorphic computing systems (NCSs) achieve state-of-the-art performance. With the development of computing resources and big data, deep neural networks (DNNs), also known as deep learning (DL), are applied in various applications such as image recognition and detection, feature extraction, and natural language processing. However, novel security threats are introduced in these applications. Attackers are trying to steal, bug, and destroy the models, thus incurring immeasurable losses. However, we do not fully understand these threats yet, due to the reason that NNs are black boxes and under active development. The complexity of NNs also exposes more vulnerabilities than traditional ML algorithms. To solve the above security threats, this dissertation focuses on identifying novel security threats against NNs and revisiting traditional issues from NNs' perspective. We also grasp the key to these attacks and explore variations and develop robust defenses against them.
One of our works aims at preventing attackers with physical access from learning the proprietary algorithm implemented by the neuromorphic hardware, i.e., replication attack. For this purpose, we leverage the obsolescence effect in memristors to judiciously reduce the accuracy of outputs for any unauthorized user. Our methodology is verified to be compatible with mainstream classification applications, memristor devices, and security and performance constraints. In many applications, public data may be poisoned when being collected as the inputs for re-training DNNs. Although poisoning attack against support vector machines (SVMs) has been extensively studied, we still have very limited knowledge and understanding about how such an attack can be implemented against neural networks. Thus, we examine the possibility of directly applying a gradient-based method to generate poisoned samples against neural networks. We then propose a generative method to accelerate the generation of poisoned samples while maintaining a high attack efficiency. Experiment results show that the generative method can significantly accelerate the generation rate of the poisoned samples compared with the numerical gradient method, with marginal degradation on model accuracy. Deepfake represents a category of face-swapping attacks that leverage machine learning models such as autoencoders or generative adversarial networks. Various detection techniques for Deepfake attacks have been explored. These methods, however, are passive measures against Deepfakes as they are mitigation strategies after the high-quality fake content is generated. More importantly, we would like to think ahead of the attackers with robust defenses. This work aims to take an offensive measure to impede the generation of high-quality fake images or videos. We propose to use novel transformation-aware adversarially perturbed faces as a defense against GAN-based Deepfake attacks. Additionally, we explore techniques for data preprocessing and augmentation to enhance models' robustness. Specifically, we leverage convolutional neural networks (CNNs) to automate the wafer inspection process and propose several techniques to preprocess and augment wafer images for enhancing our model's generalization on unseen wafers (e.g., from other fabs).
Item Open Access Software-Hardware Co-design For Deep Learning Model Acceleration(2023) Zhang, JingchiCurrent deep neural network (DNN) models have shown beyond-human performance in multiple artificial intelligent tasks. However, state-of-the-art DNN models still exhibit great issues on efficiency that pose significant obstacles to their practical application in real-world scenarios. To further improve the performance, modern DNN models keep increasing their model sizes and numbers of operations. However, it becomes a great challenge to deploy modern DNNs on mobile and edge devices because of their limited memory, computation resources and battery energy. This dissertation seeks to address these challenges by advancing and integrating techniques from both the software-design and hardware-design for efficient DNN training and inference, with the ultimate goal of developing accurate and efficient DNN models.
My research primarily focuses on advancing model compression techniques such as pruning and quantization to push the boundaries of efficiency-accuracy tradeoff in DNN models. For pruning, I propose efficient structural sparsity (ESS), a learning framework that can learning efficient structure sparsity in DNN models. Additionally, I extend ESS to acoustic applications such as speech recognition and speaker identification, demonstrating its effectiveness in various contexts. For quantization, I propose Heterogeneously Compressed Ensemble (HCE), a novel straight-forward method that build an efficient ensemble with the pruned and quantized variants from a pretrained DNN model. The above efforts have resulted in DNN models that are more accurate and efficient than existing state-of-the-art model compression methods. For hardware design, I designed an end-to-end neural network enhanced radar signal processing system on FPGA. The FPGA implementation is carefully optimized to better tradeoff between performance and energy efficiency. Finally for software-hardware co-design, I propose a Hessian-aware NM (HANM) pruning, a novel searching method to find the optimal mixed N:M sparsity scheme for deep neural network. On hardware side, we design and simulate the corresponding hardware architecture that support the various N:M sparsity schemes. In this case, HANM demonstrates optimal performance in real-world inference scenarios.
This dissertation research aims to pave the way for achieving a tradeoff between accuracy, efficiency and power consumption in DNN models, ultimately leading to the development of DNN models that are both accurate and efficient.
Item Open Access Toward Trustworthy Machine Learning with Blackbox and Whitebox Methods(2023) Qiao, XimingWith the growing applications of machine learning (ML) in high-stake areas such as autonomous driving, medical assistance, and financial prediction, building trustworthy ML models with reliable performance in novel situations becomes increasingly important. While most existing ML methods achieve good averaged performance on standard test data, their worst-case performance on adversarial or out-of-distribution data, both common in real-world scenes, can be arbitrarily bad. This dissertation discusses blackbox and whitebox methods, as short-term and long-term solutions respectively, to the trustworthy issue.The blackbox methods consider immediate remedies to existing ML systems, treat such systems as black boxes, and aim to wrap them with an extra layer of protection against common adversaries. Two specific attack settings are discussed, where attackers either modify images with small stickers, or poison a small portion of training data to inject backdoors. The proposed solutions include a neural-guided sticker reverse engineering technique and an ensemble training method based on a novel backdoor detection code. While being universal to all types of ML systems, the blackbox methods also require strong assumptions of the attack. Next, the whitebox methods explore new families of ML models that mimic human's reasoning capability, generalize to open domains, and are trustworthy by design. A novel neural representation of probabilistic programs extends existing neural networks to capture complex probabilistic knowledge of the world and perform inference. A reinforcement learning-inspired inference algorithm addresses the efficiency issue in a single input-output setting. Although still difficult to handle real-world high-dimension signals, the initial results demonstrate the potential of such methods as a long-term solution to fundamentally address the challenging trustworthy problem.
Item Open Access Towards Efficient and Robust Deep Neural Network Models(2022) Yang, HuanruiRecently, deep neural network (DNN) models have shown beyond-human performance in multiple tasks. However, DNN models still exhibit outstanding issues on efficiency and robustness that hinder their applications in the real world. For efficiency, modern DNN architectures often contain millions of parameters and require billions of operations to process a single input, making it hard to deploy these models on mobile and edge devices. For robustness, recent research on adversarial attack shows that most DNN models can be misled by tiny perturbations added on the input, leaving doubts on the robustness of DNNs in security-related tasks. To tackle these challenges, this dissertation aims to advance and incorporate techniques from both fields of DNN efficiency and robustness, leading towards efficient and robust DNN models.
My research first advances model compression techniques including pruning, low-rank decomposition, and quantization to push the boundary of efficiency-accuracy tradeoff in DNN models. For pruning, I propose DeepHoyer, a new sparsity-inducing regularizer that is both scale-invariant and differentiable. For decomposition, I apply the sparsity-inducing regularizer on the decomposed singular values of DNN layers, together with an orthogonality regularization on the singular vectors. For quantization, I propose BSQ to achieve optimal mixed-precision quantization scheme by exploring bit-level sparsity, mitigating the costly search through the large design space of quantization precision. All these works successfully achieve DNN models that are both more accurate and more efficient than state-of-the-art methods. For robustness improvement, I change the previously undesired accuracy-robustness tradeoff of a single DNN model into an efficiency-robustness tradeoff of a DNN ensemble, without hurting the clean accuracy. The method, DVERGE, combines a vulnerability diversification objective and previously investigated model compression techniques, leading to an efficient ensemble whose robustness increases with the number of sub-models. Finally, I propose to unify the pursuit of accuracy and efficiency as an optimization towards robustness against weight perturbation. Thus, I introduce Hessian-Enhanced Robust Optimization to achieve highly accurate model that are robust to post-training quantization. The accomplish of my dissertation research paves way towards controlling the tradeoff between accuracy, efficiency and robustness, and leads to efficient and robust DNN models.