Browsing by Subject "Robustness"
Results Per Page
Sort Options
Item Open Access Improving the Efficiency and Robustness of In-Memory Computing in Emerging Technologies(2023) Yang, XiaoxuanEmerging technologies, such as resistive random-access memory (ReRAM), have proven their potential in in-memory computing for deep learning applications. My dissertation work focuses on improving the efficiency and robustness of in-memory computing in emerging technologies.
Existing ReRAM-based processing-in-memory (PIM) designs can support the inferencing and the training of neural networks, such as convolutional neural networks and recurrent neural networks. However, these designs suffer from the re-writing procedure for the self-attention calculation. Therefore, I propose an architecture that enables the efficient self-attention mechanism in PIM design. The optimized calculation procedure and finer granularity pipeline design improve efficiency. The contributions lie in enabling feasible and efficient ReRAM-based PIM designs for attention-based models.
Inferencing with ReRAM-based design has one severe problem: the inferencing accuracy can be degraded due to the non-idealities in hardware devices. The robustness of the previous method is not validated under the combination of device stochastic noise. With the proposed hardware-aware training method, the robustness of inferencing accuracy can be improved. Besides, with hardware efficiency and inferencing robustness targets, the multi-objective optimization method is developed to explore the design space and generate high-quality Pareto-optimal design configurations with minimal cost. This work integrates attributes from the design space and the evaluation space and develops efficient hardware-software co-design methods.
Training with ReRAM-based design has one challenging endurance problem due to the frequent weight updates for neural network training. The expectation for endurance management is to decrease the number of weight updates and balance the write accesses. The proposed endurance-aware training method utilizes gradient structure pruning and dynamically structurally adjusts the write probabilities. This method can expand the life cycle for ReRAM during the training process.
In summary, the research above targets realizing efficient self-attention mechanisms and solving accuracy degradation and endurance problems for the inferencing and training processes. Besides, the efforts lie in figuring out the challenging parts of each topic and developing hardware-software co-design considering efficiency and robustness. The developed designs are the potential solutions for the challenging problems of in-memory computing in emerging technologies.
Item Open Access Multimodal Probabilistic Inference for Robust Uncertainty Quantification(2021) Jerfel, GhassenDeep learning models, which form the backbone of modern ML systems, generalize poorly to small changes to the data distribution. They are also bad at signalling failure, making predictions with high confidence when their training data or fragile assumptions make them unlikely to make reasonable decisions. This lack of robustness makes it difficult to trust their use in safety-critical settings. Accordingly, there is a pressing need to equip models with a notion of uncertainty to understand their failure modes and detect when their decisions cannot be used or require intervention. Uncertainty quantification is thus crucial for ML systems to work consistently on real-world data and fail loudly when they don’t.One growing line of research on uncertainty quantification is probabilistic modelling which is concerned with capturing model uncertainty by placing a distribution over the models which can be marginalized at test-time. This is especially useful in underspecified models which can have diverse near-optimal solutions, at training time, with similar population-level performance. However, probabilistic modelling approaches such as Bayesian neural networks (BNN) do not scale well in terms of memory and runtime and often underperform simple deterministic baselines in terms of accuracy. Furthermore, BNNs underperform deep ensembles as they fail to explore multiple modes, in the loss space, while being effective at capturing uncertainty within a single mode.
In this thesis, we develop multimodal representations of model uncertainty that can capture a diverse set of hypotheses. We first propose a scalable family of BNN priors (and corresponding approximate posteriors) that combine the local (i.e. within-mode) uncertainty with mode averaging to deliver robust and calibrated uncertainty estimates in addition to improving accuracy both in and out of distribution. We then leverage a multimodal representation of uncertainty to modulate the amount of information transfer between tasks in meta-learning. Our proposed framework integrates Bayesian non-parametric mixtures with deep learning to enable NNs to adapt their capacity as more data is observed which is crucial for lifelong learning. Finally, we propose to replace the reverse Kullback-Leibler divergence (RKL), known for its mode-seeking behavior and for underestimating posterior covariance, with the forward KL (FKL) divergence in a theoretically-guided novel inference procedure. This ensures the efficient combination of variational boosting with adaptive importance sampling. The proposed algorithm offers a well-defined compute-accuracy trade-off and is guaranteed to converge to the optimal multimodal variational solution as well as the optimal importance sampling proposal distribution.
Item Open Access Robustness Analysis and Improvement in Neural Networks and Neuromorphic Computing(2021) Song, ChangDeep learning and neural networks have great potential while still at risk. The so-called adversarial attacks, which apply small perturbations on input samples to fool models, threaten the reliability of neural networks and their hardware counterparts, neuromorphic computing. To solve such issues, various attempts are made, including adversarial training and other data augmentation methods.In our early attempt to defend adversarial attacks, we propose a multi-strength adversarial training method to cover a wider effective range than typical single-strength adversarial training. Furthermore, we also propose two different structures in order to compensate for the tradeoff between the total training time and the hardware implementation cost. Experimental results show that our proposed method gives better accuracy than the baselines with tolerable additional hardware cost. To better understand robustness, we analyze the adversarial problem in the decision space. In one of our defense approaches called feedback learning, we theoretically prove the effectiveness of adversarial training and other data augmentation method. For empirical proof, we generate non-adversarial examples based on the information of the decision boundaries of neural networks and add these examples in training. The results show that the boundaries of the models are more robust to noises and perturbations after applying feedback learning than baselines. Besides algorithm-level concerns, we also focus on hardware implementations in quantization scenarios. We find that adversarially-trained neural networks are more vulnerable to quantization loss than plain models. To improve the robustness of hardware-based quantized models, we explore methods such as feedback learning, nonlinear mapping, and layer-wise quantization. Results show that the adversarial and quantization robustness can be improved by feedback learning and nonlinear mapping, respectively. But the accuracy gap introduced by quantization can be further minimized. To minimize both losses simultaneously, we also propose a layer-wise adversarial-aware quantization method to choose the best quantization parameter settings for adversarially-trained models. In this method, we use the Lipschitz constant of different layers as error sensitivity metrics and design several criteria to decide the quantization settings for each layer. The results show that our method can further minimize the accuracy gap between full-precision and quantized adversarially-trained models.
Item Open Access Robustness and Generalization Under Distribution Shifts(2022) Bertran Lopez, Martin AndresMachine learning algorithms are applied in a wide variety of fields such as finance, healthcare, and entertainment. The objectives of these machine learning algorithms are varied, with two of the most common use cases being inference of a target variable from observations, and sequential decision-making to maximize a reward in the reinforcement learning setting. Regardless of the objective, it is common for machine learning algorithms to be trained on a finite dataset where each sample is collected independently from some data distribution emulating the real world, or, in the case of reinforcement learning, over a finite set of interactions with an environment simulating real world interactions.
One major concern is how to characterize the generalization of these objectives outside of their training data, measured as the discrepancy between performance on the training dataset or environment, and performance in the real world. This is exacerbated by the fact that many applications suffer from distribution shift; a phenomenon where there is a mismatch between the training distribution and the real world environment. Algorithms that are not robust to distribution shifts are liable to present unintended behaviours during deployment. In this work, we develop tools to minimize the risks posed by distribution shifts in a variety of settings. In the first part of this work, we propose and analyze techniques to deal with distribution shifts the supervised learning setting, making the model's decision either independent or robust to certain factors in the input distribution, and show the efficacy of these techniques in dealing with distribution shift. We later examine the setting of sequential decision making, where we discuss how to reinterpret the reinforcement learning scenario in a way that allows generalization bounds from standard supervised learning to be applied to reinforcement learning. We then analyze how to learn representations that are invariant to task-irrelevant distribution, and demonstrate how this can improve performance in the presence of distribution shifts.
Item Open Access Towards Efficient and Robust Deep Neural Network Models(2022) Yang, HuanruiRecently, deep neural network (DNN) models have shown beyond-human performance in multiple tasks. However, DNN models still exhibit outstanding issues on efficiency and robustness that hinder their applications in the real world. For efficiency, modern DNN architectures often contain millions of parameters and require billions of operations to process a single input, making it hard to deploy these models on mobile and edge devices. For robustness, recent research on adversarial attack shows that most DNN models can be misled by tiny perturbations added on the input, leaving doubts on the robustness of DNNs in security-related tasks. To tackle these challenges, this dissertation aims to advance and incorporate techniques from both fields of DNN efficiency and robustness, leading towards efficient and robust DNN models.
My research first advances model compression techniques including pruning, low-rank decomposition, and quantization to push the boundary of efficiency-accuracy tradeoff in DNN models. For pruning, I propose DeepHoyer, a new sparsity-inducing regularizer that is both scale-invariant and differentiable. For decomposition, I apply the sparsity-inducing regularizer on the decomposed singular values of DNN layers, together with an orthogonality regularization on the singular vectors. For quantization, I propose BSQ to achieve optimal mixed-precision quantization scheme by exploring bit-level sparsity, mitigating the costly search through the large design space of quantization precision. All these works successfully achieve DNN models that are both more accurate and more efficient than state-of-the-art methods. For robustness improvement, I change the previously undesired accuracy-robustness tradeoff of a single DNN model into an efficiency-robustness tradeoff of a DNN ensemble, without hurting the clean accuracy. The method, DVERGE, combines a vulnerability diversification objective and previously investigated model compression techniques, leading to an efficient ensemble whose robustness increases with the number of sub-models. Finally, I propose to unify the pursuit of accuracy and efficiency as an optimization towards robustness against weight perturbation. Thus, I introduce Hessian-Enhanced Robust Optimization to achieve highly accurate model that are robust to post-training quantization. The accomplish of my dissertation research paves way towards controlling the tradeoff between accuracy, efficiency and robustness, and leads to efficient and robust DNN models.
Item Open Access Universal Biology(2014) Mariscal, CarlosOur only example of life is that of Earth- which is a single lineage. We know very little about what life would look like if we found evidence of a second origin. Yet there are some universal features of geometry, mechanics, and chemistry that have predictable biological consequences. The surface-to-volume ratio property of geometry, for example, places a maximum limit on the size of unassisted cells in a given environment. This effect is universal, interesting, not vague, and not arbitrary. Furthermore, there are some problems in the universe that life must invariably solve if it is to persist, such as resistance to radiation, faithful inheritance, and resistance to environmental pressures. At least with respect to these universal problems, some solutions must consistently emerge.
In this dissertation, I develop and defend my own account of universal biology, the study of non-vague, non-arbitrary, non-accidental, universal generalizations in biology. In my account, a candidate biological generalization is assessed in terms of the assumptions it makes. A successful claim is accepted only if its justification necessarily makes reference to principles of evolution and makes no reference to contingent facts of life on Earth. In this way, we can assess the robustness with which generalizations can be expected to hold. I contend that using a stringent-enough causal analysis, we are able to gather insight into the nature of life everywhere. Life on Earth may be our single example of life, but this is merely a reason to be cautious in our approach to life in the universe, not a reason to give up altogether.