Browsing by Subject "Computer engineering"
Results Per Page
Sort Options
Item Open Access A Data-Intensive Framework for Analyzing Dynamic Supreme Court Behavior(2012) Calloway, Timothy JosephMany law professors and scholars think of the Supreme Court as a black box--issues and arguments go in to the Court, and decisions come out. The almost mystical nature that these researchers impute to the Court seems to be a function of the lack of hard data and statistics about the Court's decisions. Without a robust dataset from which to draw proper conclusions, legal scholars are often left only with intuition and conjecture.
Explaining the inner workings of one of the most important institutions in the United States using such a subjective approach is obviously flawed. And, indeed, data is available that can provide researchers with a better understanding of the Court's actions, but scholars have been slow in adopting a methodology based on data and statistical analysis. The sheer quantity of available data is overwhelming and might provide one reason why such an analysis has not yet been undertaken.
Relevant data for these studies is available from a variety of sources, but two in particular are of note. First, legal database provider LexisNexis provides a huge amount of information about how the Court's opinions are treated by subsequent opinions; thus, if the Court later overrules one of its earlier opinions, that information is captured by LexisNexis. Second, researchers at Washington University in St. Louis have compiled a database that provides detailed information about each Supreme Court decision. Combining these two sources into a coherent database will provide a treasure trove of results for future researchers to study, use, and build upon.
This thesis will explore a first-of-its-kind attempt to parse these massive datasets to provide a powerful tool for future researchers. It will also provide a window to help the average citizen understand Supreme Court behavior more clearly. By utilizing traditional data extraction and dataset analysis methods, many informative conclusions can be reached to help explain why the Court acts the way it does. For example, the results show that decisions decided by a narrow margin (i.e., by a 5 to 4 vote) are almost 4x more likely to be overruled than unanimous decisions by the Court. Many more results like these can be synthesized from the dataset and will be presented in this thesis. Possibly of higher importance, this thesis presents a framework to predict the outcomes of future and pending Supreme Court cases using statistical analysis of the data gleaned from the dataset.
In the end, this thesis strives to provide input data as well as results data for future researchers to use in studying Supreme Court behavior. It also provides a framework that researchers can use to analyze the input data to create even more results data.
Item Open Access A Discrete Monolayer Cardiac Tissue Model for Tissue Preparation Specific Modeling(2010) Kim, JongmyeongEngineered monolayers created by using microabrasion and micropatterning methods have provided a simplified in vitro system to study the effects of anisotropy and fiber direction on electrical propagation. Interpreting the behavior in these culture systems has often been performed using classical computer models with continuous properties. Such models, however, do not account for the effects of random cell shapes, cell orientations and cleft spaces inherent in these monolayers on the resulting wavefront conduction. Additionally when the continuous computer model is built to study impulse propagations, the intracellular conductivities of the model are commonly assigned to match impulse conduction velocity of the model to the experimental measurement. However this method can result in inaccurate intracellular conductivities considering the relationship among the conduction velocity, intracellular conductivities and ion channel properties. In this study, we present novel methods for modeling a monolayer cardiac tissue and for estimating intracellular conductivities from an optical mapping. First, in the proposed method for modeling a monolayer of cardiac tissue, the factors governing cell shape, cell-to-cell coupling and the degree of cleft space are not constant but rather are treated as spatially random with assigned distributions. This approach makes it possible to simulate wavefront propagation in a manner analogous to performing experiments on engineered monolayer tissues. Simulated results are compared to reported experimental data measured from monolayers used to investigate the role of cellular architecture on conduction velocities and anisotropy ratios. We also present an estimate for obtaining the electrical properties from these networks and demonstrate how variations in the discrete cellular architecture affect the macroscopic conductivities. The simulation results agree with the common assumption that under normal ranges of coupling strengths, tissues whose cell shapes and connectivity show relatively uniform distributions can be represented using continuous models with conductivities derived from random discrete cellular architecture using either estimates. The results also reveal that in the presence of abrupt changes in cell orientation, local estimates of tissue properties predict smoother changes in conductivities that may not adequately predict the discrete nature of propagation at the transition sites. Second, a novel approach is proposed to estimate intracellular conductivities from the optical mapping of the monolayer cardiac tissue under subthreshold stimulus. This method uses a simplified membrane model, which represents the membrane as a second order polynomial of the membrane potential. The simplified membrane model and the intracellular conductivities are estimated from the optical mapping of the monolayer tissue under the subthreshold stimulus. We showed that the proposed method provides more accurate intracellular conductivities compared to a method using a constant membrane resistance.
Item Open Access A Formal Framework for Designing Verifiable Protocols(2017) Matthews, OpeoluwaProtocols play critical roles in computer systems today, including managing resources, facilitating communication, and coordinating actions of components. It is highly desirable to formally verify protocols, to provide a mathematical guarantee that they behave correctly. Ideally, one would pass a model of a protocol into a formal verification tool, push a button, and the tool uncovers bugs or certifies that the protocol behaves correctly. Unfortunately, as a result of the state explosion problem, automated formal verification tools struggle to verify the increasingly complex protocols that appear in computer systems today.
We observe that design decisions have a significant impact on the scalability of verifying a protocol in a formal verification tool. Hence, we present a formal framework that guides architects in designing protocols specifically to be verifiable with state of the art formal verification tools. If architects design protocols to fit our framework, the protocols inherit scalable automated verification. Key to our framework is a modular approach to constructing protocols from pre-verified subprotocols. We formulate properties that can be proven in automated tools to hold on these subprotocols, guaranteeing that any arbitrary composition of the subprotocols behaves correctly. The result is that we can design complex hierarchical (tree) protocols that are formally verified, using fully automated tools, for any number of nodes or any configuration of the tree. Our framework is applicable to a large class of protocols, including power management, cache coherence, and distributed lock management protocols.
To demonstrate the efficacy of our framework, we design and verify a realistic hierarchical (tree) coherence protocol, using a fully automated tool to prove that it behaves correctly for any configuration of the tree. We identify certain protocol optimizations prohibited by our framework or the state of the art verification tools and we evaluate our verified protocol against unverifiable protocols that feature these optimizations. We find that these optimizations have a negligible impact on performance. We hope that our framework can be used to design a wide variety of protocols that are verifiable, high-performing, and architecturally flexible.
Item Open Access Accelerated Motion Planning Through Hardware/Software Co-Design(2019) Murray, SeanRobotics has the potential to dramatically change society over the next decade. Technology has matured such that modern robots can execute complex motions with sub-millimeter precision. Advances in sensing technology have driven down the price of depth cameras and increased their performance. However, the planning algorithms used in currently-deployed systems are too slow to react to changing environments; this has restricted the use of high degree-of-freedom (DOF) robots to tightly-controlled environments where planning in real time is not necessary.
Our work focuses on overcoming this challenge through careful hardware/software co-design. We leverage aggressive precomputation and parallelism to design accelerators for several components of the motion planning problem. We present architectures for accelerating collision detection as well as path search. We show how we can maintain flexibility even with custom hardware, and describe microarchitectures that we have implemented at the register-transfer level. We also show how to generate effective planning roadmaps for use with our designs.
Our accelerators bring the total planning latency to less than 3 microseconds, several orders of magnitude faster than the state of the art. This capability makes it possible to deploy systems that plan under uncertainty, use complex decision making algorithms, or plan for multiple robots in a workspace. We hope this technology will push robotics into domains and applications that were previously infeasible.
Item Open Access Accelerating Data Parallel Applications via Hardware and Software Techniques(2020) Bashizade, RaminThe unprecedented amount of data available today opens the door to many new applications in areas such as finance, scientific simulation, machine learning, etc. Many such applications perform the same computations on different data, which are called data-parallel. However, processing this enormous amount of data is challenging, especially in the post-Moore's law era. Specialized accelerators are a promising solution to meet the performance requirements of data-parallel applications. Among these are graphics processing units (GPUs), as well as more application-specific solutions.
One of the areas with high performance requirements is statistical machine learning, which has widespread applications in various domains. These methods include probabilistic algorithms, such as Markov Chain Monte-Carlo (MCMC), which rely on generating random numbers from probability distributions. These algorithms are computationally expensive on conventional processors, yet their statistical properties, namely, interpretability and uncertainty quantification compared to deep learning, make them an attractive alternative approach. Therefore, hardware specialization can be adopted to address the shortcomings of conventional processors in running these applications.
In addition to hardware techniques, probabilistic algorithms can benefit from algorithmic optimizations that aim to avoid performing unnecessary work. To be more specific, we can skip a random variable (RV) whose probability distribution function (PDF) is concentrated on only one value, i.e., there is only one value to choose, and the values of its neighboring RVs have not changed. In other words, if a RV has a concentrated PDF, its PDF will remain concentrated until at least one of its neighbors changes. Due to their high throughput and centralized scheduling mechanism, GPUs are a suitable target for this optimization.
Other than probabilistic algorithms, GPUs can be utilized to accelerate a variety of applications. GPUs with their Single-Instruction Multiple-Thread (SIMT) execution model offer massive parallelism that is combined with a relative ease of programming. The large amount and diversity of resources on the GPU is intended to ensure applications with different characteristics can achieve high performance, but at the same time it means that some of these resources will remain under-utilized, which is inefficient in a multi-tenant environment.
In this dissertation, we propose and evaluate solutions to the challenges mentioned above, namely i) accelerating probabilistic algorithms with uncertainty quantification, ii) optimizing probabilistic algorithms on GPUs to avoid unnecessary work, and iii) increasing resource utilization of GPUs in multi-tenant environments.
Item Open Access Accelerating Probabilistic Computing with a Stochastic Processing Unit(2020) Zhang, XiangyuStatistical machine learning becomes a more important workload for computing systems than ever before. Probabilistic computing is a popular approach in statistical machine learning, which solves problems by iteratively generating samples from parameterized distributions. As an alternative to Deep Neural Networks, probabilistic computing provides conceptually simple, compositional, and interpretable models. However, probabilistic algorithms are often considered too slow on the conventional processors due to sampling overhead to 1) computing the parameters of a distribution and 2) generating samples from the parameterized distribution. A specialized architecture is needed to address both the above aspects.
In this dissertation, we claim a specialized architecture is necessary and feasible to efficiently support various probabilistic computing problems in statistical machine learning, while providing high-quality and robust results.
We start with exploring a probabilistic architecture to accelerate Markov Random Field (MRF) Gibbs Sampling by utilizing the quantum randomness of optical-molecular devices---Resonance Energy Transfer (RET) networks. We provide a macro-scale prototype, the first such system to our knowledge, to experimentally demonstrate the capability of RET devices to parameterize a distribution and run a real application. By doing a quantitative result quality analysis, we further reveal the design issues of an existing RET-based probabilistic computing unit (1st-gen RSU-G) that lead to unsatisfactory result quality in some applications. By exploring the design space, we propose a new RSU-G microarchitecture that empirically achieves the same result quality as 64-bit floating-point software, with the same area and modest power overheads compared with 1st-gen RSU-G. An efficient stochastic probabilistic unit can be fulfilled using RET devices.
The RSU-G provides high-quality true Random Number Generation (RNG). We further explore how quality of an RNG is related to application end-point result quality. Unexpectedly, we discover the target applications do not necessarily require high-quality RNGs---a simple 19-bit Linear-Feedback Shift Register (LFSR) does not degrade end-point result quality in the tested applications. Therefore, we propose a Stochastic Processing Unit (SPU) with a simple pseudo RNG that achieves equivalent function to RSU-G but maintains the benefit of a CMOS digital circuit.
The above results bring up a subsequent question: are we confident to use a probabilistic accelerator with various approximation techniques, even though the end-point result quality ("accuracy") is good in tested benchmarks? We found current methodologies for evaluating correctness of probabilistic accelerators are often incomplete, mostly focusing only on end-point result quality ("accuracy") but omitting other important statistical properties. Therefore, we claim a probabilistic architecture should provide some measure (or guarantee) of statistical robustness. We take a first step toward defining metrics and a methodology for quantitatively evaluating correctness of probabilistic accelerators. We propose three pillars of statistical robustness: 1) sampling quality, 2) convergence diagnostic, and 3) goodness of fit. We apply our framework to a representative MCMC accelerator (SPU) and surface design issues that cannot be exposed using only application end-point result quality. Finally, we demonstrate the benefits of this framework to guide design space exploration in a case study showing that statistical robustness comparable to floating-point software can be achieved with limited precision, avoiding floating-point hardware overheads.
Item Open Access Accelerator Architectures for Deep Learning and Graph Processing(2020) Song, LinghaoDeep learning and graph processing are two big-data applications and they are widely applied in many domains. The training of deep learning is essential for inference and has not yet been fully studied. With data forward, error backward, and gradient calculation, deep learning training is a more complicated process with higher computation and communication intensity. Distributing computations on multiple heterogeneous accelerators to achieve high throughput and balanced execution, however, remaining challenging. In this dissertation, I present AccPar, a principled and systematic method of determining the tensor partition for multiple heterogeneous accelerators for efficient training acceleration. Emerging resistive random access memory (ReRAM) is promising for processing in memory (PIM). For high-throughput training acceleration in ReRAM-based PIM accelerator, I present PipeLayer, an architecture for layer-wise pipelined parallelism. Graph processing is well-known for poor locality and high memory bandwidth demand. In conventional architectures, graph processing incurs a significant amount of data movements and energy consumption. I present GraphR, the first ReRAM-based graph processing accelerator which follows the principle of near-data processing and explores the opportunity of performing massive parallel analog operations with low hardware and energy cost. Sparse matrix-vector multiplication (SpMV), a subset of graph processing, is the key computation in iterative solvers for scientific computing. The efficiently accelerating floating-point processing in ReRAM remains a challenge. In this dissertation, I present ReFloat, a data format, and a supporting accelerator architecture, for low-cost floating-point processing in ReRAM for scientific computing.
Item Open Access Adaptive Methods for Machine Learning-Based Testing of Integrated Circuits and Boards(2020) Liu, MengyunThe relentless growth in information technology and artificial intelligence (AI) is placing demands on integrated circuits and boards for high performance, added functionality, and low power consumption. As a result, design complexity and integration continue to increase, and emerging devices are being explored. However, these new trends lead to high test cost and challenges associated with semiconductor test.
Machine learning has emerged as a powerful enabler in various application domains, and it provides an opportunity to overcome the challenges associated with expert-based test. Taking the advantages of powerful machine-learning techniques, useful information can be extracted from history testing data, and this information helps facilitate the testing process for both chips and boards.
Moreover, to attain test cost reduction with no test quality degradation, adaptive methods for testing are now being advocated. In conventional testing methods, variations among different chips and different boards are ignored. As a result, the same test items are applied to all chips; online testing is carried out after every fixed interval; immutable fault-diagnosis models are used for all boards. In contrast, adaptive methods observe changes in the distribution of testing data and dynamically adjust the testing process, and hence reduce the test cost. In this dissertation, we study solutions for both chip-level test and board-level test. Our objective is to design the most proper solutions for adapting machine-learning techniques to testing area.
For chip-level test, the dissertation first presents machine learning-based adaptive testing to drop unnecessary test items and reduce the test cost in high-volume chip manufacturing. The proposed testing framework uses the parametric test results from circuit probing test to train a quality-prediction model, partitions chips into different groups based on the predicted quality, and selects the different important test items for each group of chips. To achieve the same defect level as in prior work on adaptive testing, the proposed fine-grained adaptive testing method significantly reduces test cost.
Besides CMOS-based chips, emerging devices (e.g., resistive random access memory (ReRAM)) are being explored to implement AI chips with high energy efficiency. Due to the immature fabrication process, ReRAMs are vulnerable to dynamic faults. Instead of periodically interrupting the computing process and carrying out the testing process, the dissertation presents an efficient method to detect the occurrence of dynamic faults in ReRAM crossbars. This method monitors an indirect measure of the dynamic power consumption of each ReRAM crossbar, determines the occurrence of faults when a changepoint is detected in the monitored power-consumption time series. This model also estimates the percentage of faulty cells in a ReRAM crossbar by training a machine learning-based predictive model. In this way, the time-consuming fault localization and error recovery steps are only carried out when a high defect rate is estimated, and hence the test time is considerably reduced.
For board-level test, the cost associated with the diagnosis and repair due to board-level failures is one of the highest contributors to board manufacturing cost. To reduce the cost associated with fault diagnosis, a machine learning-based diagnosis workflow has been developed to support board-level functional fault identification in the dissertation. In a production environment, the large volume of manufacturing data comes in a streaming format and may exhibit a time-dependent concept drift. In order to process streaming data and adapt to concept drifts, instead of using an immutable diagnosis model, this dissertation also presents the method that uses an online learning algorithm to incrementally update the identification model. Experimental results show that, with the help of online learning, the diagnosis accuracy is improved, and the training time is significantly reduced.
The machine learning-based diagnosis workflow can identify board-level functional faults with high accuracy. However, the prediction accuracy is low when a new board has a limited amount of fail data and repair records. The dissertation presents a diagnosis system that can utilize domain-adaptation algorithms to transfer the knowledge learned from a mature board to a new board. Domain adaptation significantly reduces the requirement for the number of repair records from the new board, while achieving a relatively high diagnostic accuracy in the early stage of manufacturing a new product. The proposed domain adaptation workflow designs a metric to evaluate the similarity between two types of boards. Based on the calculated similarity value, different domain-adaptation algorithms are selected to transfer knowledge and train a diagnosis model.
In summary, this dissertation tackles important problems related to the testing of integrated circuits and boards. By considering variations among different chips or boards, machine learning-based adaptive methods enable the reduction of test cost. The proposed machine learning-based testing methods are expected to contribute to quality assurance and manufacturing-cost reduction in the semiconductor industry.
Item Open Access Advancing the Design and Utility of Adversarial Machine Learning Methods(2021) Inkawhich, Nathan AlbertWhile significant progress has been made to craft Deep Neural Networks (DNNs) with super-human recognition performance, their reliability and robustness in challenging operating conditions is still a major concern. In this work, we study multiple facets of the DNN robustness problem by pursuing two main threads of research. The key methodological linkage throughout our investigations is the consistent design/development/utilization/deployment of Adversarial Machine Learning techniques, which have remarkable abilities to both degrade and enhance model performance. Our ultimate goal is to help construct the more safe and reliable models of the future.
In the first thread of research, we take the perspective of an adversary who wishes to find novel and increasingly potent ways to fool current DNN models. Our approach is centered around the development of a feature space attack, and the construction of novel adversarial threat models that work to reduce required knowledge assumptions. Interestingly, we find that a transfer-based blackbox adversary can be significantly more powerful than previously believed, and can reliably cause targeted misclassifications with imperceptible noises. Further, we find that the attacker does not necessarily require access to the target model's training distribution to create transferable attacks, which is a more practically concerning scenario due to the reduction of required attacker knowledge.
Along the second thread of research, we take the perspective of a DNN model designer whose job is to create systems capable of robust operation in ``open-world'' environments, where both known and unknown target types may be encountered. Our approach is to establish a classifier + out-of-distribution (OOD) detector system co-design that is centered around an adversarial training procedure and an outlier exposure-based learning objective. Through various experiments, we find that our systems can achieve high accuracy in extended operating conditions, while reliably detecting and rejecting fine-grained OOD target types. We also develop a method for efficiently improving OOD detection by learning from the deployment environment. Overall, by exposing novel vulnerabilities of current DNNs while also improving the reliability of existing models to known vulnerabilities, our work makes significant progress towards creating the next-generation of more trustworthy models.
Item Unknown ADVANCING VISION INTELLIGENCE THROUGH THE DEVELOPMENT OF EFFICIENCY, INTERPRETABILITY AND FAIRNESS IN DEEP LEARNING MODELS(2024) Kong, FanjieDeep learning has demonstrated remarkable success in developing vision intelligence across a variety of application domains, including autonomous driving, facial recognition, medical image analysis, \etc.However, developing such vision systems poses significant challenges, particularly in relation to ensuring efficiency, interpretability, and fairness. Efficiency requires a model to leverage the least possible computational resources while preserving performance relative to more computationally-demanding alternatives, which is essential for the practical deployment of large-scale models in real-time applications. Interpretability demands a model to align with the domain-specific knowledge of the task it addresses while having the capability for case-based reasoning. This characteristic is especially crucial in high-stakes areas such as healthcare, criminal justice, and financial investment. Fairness ensures that computer vision models do not perpetuate or exacerbate societal biases in downstream applications such as web image search, text-guided image generation, \etc. In this dissertation, I will discuss the contributions that I have made in advancing vision intelligence regarding to efficiency, interpretability and fairness in computer vision models.
The first part of this dissertation will focus on how to design computer vision models to efficiently process very large images.We propose a novel CNN architecture termed { \em Zoom-In Network} that leverages a hierarchical attention sampling mechanisms to select important regions of images to process. Such approach without processing the entire image yields outstanding memory efficiency while maintaining classification accuracy on various tiny object image classification datasets.
The second part of this dissertation will discuss how to build post-hoc interpretation method for deep learning models to obtain insights reasoned from the predictions.We propose a novel image and text insight-generation framework based on attributions from deep neural nets. We test our approach on an industrial dataset and demonstrate our method outperforms competing methods.
Finally, we study fairness in large vision-language models.More specifically, we examined gender and racial bias in text-based image retrieval for neutral text queries. In an attempt to address bias in the test-time phase, we proposed post-hoc bias mitigation to actively balance the demographic group in the image search results. Experiments on multiple datasets show that our method can significantly reduce bias while maintaining satisfactory retrieval accuracy at the same time.
My research in enhancing vision intelligence via developments in efficiency, interpretability, and fairness, has undergone rigorous validation using publicly available benchmarks and has been recognized at leading peer-reviewed machine learning conferences.This dissertation has sparked interest within the AI community, emphasizing the importance of improving computer vision models through these three critical dimensions, namely, efficiency, interpretability and fairness.
Item Open Access Algorithm-hardware co-optimization for neural network efficiency improvement(2020) Yang, QingDeep neural networks (DNNs) are tremendously applied in the artificial intelligence field. While the performance of DNNs is continuously improved by more complicated and deeper structures, the feasibility of deployment on edge devices remains a critical problem. In this thesis, we present algorithm-hardware co-optimization approaches to address the challenges of efficient DNN deployments from three aspects: 1) save computational cost, 2) save memory cost, and 3) save data movements.
First, we present a joint regularization technique to advance the compression beyond the weights to neuron activations. By distinguishing and leveraging the significant difference among neuron responses and connections during learning, the jointly pruned network, namely JPnet, optimizes the sparsity of activations and weights. Second, to structurally regulate the dynamic activation sparsity (DAS), we propose a generic low-cost approach based on winners-take-all (WTA) dropout technique. The network enhanced by the proposed WTA dropout, namely DASNet, features structured activation sparsity with an improved sparsity level, which can be easily utilized to achieve acceleration on conventional embedded systems. The effectiveness of JPNet and DASNet has been thoroughly evaluated through various network models with different activation functions and on different datasets. Third, we propose BitSystolic, a neural processing unit based on a systolic array structure, to fully support the mixed-precision inference. In BitSystolic, the numerical precision of both weights and activations can be configured in the range of 2b~8b, fulfilling different requirements across mixed-precision models and tasks. Moreover, the design can support various data flows presented in different types of neural layers and adaptively optimize the data reuse by switching between the matrix-matrix mode and vector-matrix mode. We designed and fabricated the proposed BitSystolic in the 65nm process. Our measurement results show that BitSystolic features the unified power efficiency of up to 26.7 TOPS/W with 17.8 mW peak power consumption across various layer types. In the end, we will have a glance at computing-in-memory architectures based on resistive random-access memory (ReRAM) which realizes in-place storage and computation. A quantized training method is proposed to enhance the accuracy of neuromorphic systems based on ReRAM by alleviating the impact of limited parameter precision.
Item Open Access Anomaly-Detection and Health-Analysis Techniques for Core Router Systems(2018) Jin, ShiA three-layer hierarchy is typically used in modern telecommunication systems in order to achieve high performance and reliability. The three layers, namely core, distribution, and access, perform different roles for service fulfillment. The core layer is also referred to as the network backbone, and it is responsible for the transfer of a large amount of traffic in a reliable and timely manner. The network devices (such as routers) in the core layer are vulnerable to hard-to-detect/hard-to-recover errors. For example, the cards that constitute core router systems and the components that constitute a card can encounter hardware failures. Moreover, connectors between cards and interconnects between different components inside a card are also subject to hard faults. Also, since the performance requirement of network devices in the core layer is approaching Tbps levels, failures caused by subtle interactions between parallel threads or applications have become more frequent. All these different types of faults can cause a core router to become incapacitated, necessitating the design and implementation of fault-tolerant mechanisms in the core layer.
Proactive fault tolerance is a promising solution because it takes preventive action before a failure occurs. The state of the system is monitored in a real-time manner. When anomalies are detected, proactive repair actions such as job migration are executed to avoid errors, thereby maintaining the non-stop utilization of the entire system. The effectiveness of proactive fault-tolerance solutions depends on whether abnormal behaviors of core routers can be accurately pinpointed in a timely manner.
This dissertation first presents an anomaly detector for core router systems using correlation-based time series analysis. The proposed technique monitors a set of features obtained from a system deployed in the field. Various types of correlations among extracted features are identified. A set of features with minimum redundancy and maximum relevance are then grouped into different categories based on their statistical characteristics. A hybrid approach is developed to analyze various feature categories using a combination of different anomaly detection methods, leading to the detection of realistic anomalies.
Next, this dissertation presents the design of a changepoint-based anomaly detector such that anomaly detection can be adaptive to changes in the statistical features of data streams. The proposed method first detects changepoints from collected time-series data, and then utilizes these changepoints to detect anomalies. A clustering method is developed to identify a wide range of the normal/abnormal patterns from changepoint windows. Experimental results show that changepoint-based anomaly detector can detect outliers even when the statistical properties of the monitored data change significantly with time.
An efficient data-driven anomaly detector is not adequate to obtain a full picture of the health status of monitored core routers. It is also essential to learn how healthy a core router system is and how different task scenarios can affect the system. Therefore, this dissertation presents a symbol-based health status analyzer that first encodes, as a symbol sequence, the long-term complex time series collected from a number of core routers, and then utilizes the symbol sequence for health analysis. Symbol-based clustering and classification methods are developed to identify the health status.
In order to accurately identify the health status, historical operation data needs to be fully labeled, which is a challenge in the early stages of monitoring. Therefore, this dissertation presents an iterative self-learning procedure for assessing the health status. This procedure first computes a representative feature matrix to capture different characteristics of time-series data. Hierarchical clustering is then utilized to infer labels for the unlabeled dataset. Finally, a classifier is built and iteratively updated using both labeled and unlabeled datasets. Partially-labeled field data collected from a set of commercial core routers are used to experimentally validate the proposed method.
In summary, the dissertation tackles important problems of anomaly detection and health status analysis in complex core router systems. The results emerging from this dissertation provide the first comprehensive set of data-driven resiliency solutions for core router systems. It is anticipated that other high-performance computing systems will also benefit from this framework.
Item Open Access Applying Machine Learning to Testing and Diagnosis of Integrated Systems(2021) Pan, RenjianThe growing complexity of integrated boards and systems makes manufacturing test and diagnosis increasingly expensive. There is a pressing need to reduce test cost and to pinpoint the root causes of integrated systems in a more effective way. In light of machine learning, a number of intelligent test-cost reduction and root-cause analysis methods have been proposed. However, it remains extremely challenging to (i) reduce test cost for black-box testing for integrated systems, and (ii) pinpoint the root causes for integrated systems with little need on labeled test data from repair history. To tackle these challenges, we propose multiple machine-learning-based solutions for black-box test-cost reduction and unsupervised/semi-supervised root-cause analysis in this dissertation.For black-box test-cost reduction, we propose a novel test selection method based on a Bayesian network model. First, it is formulated as a constrained optimization problem. Next, a score-based algorithm is implemented to construct the Bayesian network for black-box tests. Finally, we propose a Bayesian index with the property of Markov blankets, and then an iterative test selection method is developed based on our proposed Bayesian index. For root-cause analysis, we first propose an unsupervised root-cause analysis method in which no repair history is needed. In the first stage, a decision-tree model is trained with system test information to cluster the data in a coarse-grained manner. In the second stage, frequent-pattern mining is applied to extract frequent patterns in each decision-tree node to precisely cluster the data so that each cluster represents only a small number of root causes. The proposed method can accommodate both numerical and categorical test items. A combination of the L-method, cross validation and Silhouette score enables us to automatically determine all hyper-parameters. Two industry case studies with system test data demonstrate that the proposed approach significantly outperforms the state-of-the-art unsupervised root-cause-analysis method. Utilizing transfer learning, we further improve the performance of unsupervised root-cause-analysis. A two-stage clustering method is first developed by exploiting model selection based on the concept of Silhouette score. Next, a data-selection method based on ensemble learning is proposed to transfer valuable information from a source product to improve the diagnosis accuracy on the target product with insufficient data. Two case studies based on industry designs demonstrate that the proposed approach significantly outperforms other state-of-the-art unsupervised root-cause-analysis methods. In addition, we propose a semi-supervised root-cause-analysis method with co-training, where only a small set of labeled data is required. Using random forest as the learning kernel, a co-training technique is proposed to leverage the unlabeled data by automatically pre-labeling a subset of them and retraining each decision tree. In addition, several novel techniques have been proposed to avoid over-fitting and determine hyper-parameters. Two case studies based on industrial designs demonstrate that the proposed approach significantly outperforms the state-of-the-art methods. In summary, this dissertation addresses the most difficult problems in testing and diagnosis of integrated systems with machine learning. A test selection method based on Bayesian networks reduces the test cost for black-box testing. With unsupervised learning, semi-supervised learning and transfer learning, we analysis root causes for integrated systems without much need on historical diagnosis information. The proposed approaches are expected to contribute to the semiconductor industry by effectively reducing the black-box test cost and efficiently diagnosing the integrated systems.
Item Open Access Architectures for Memristor-based Storage Structures(2011) Liu, YangRapid data growth nowadays makes it more critical to reduce search time to improve the performance of search-intensive applications. However, huge data size makes it more difficult to efficiently perform search operations. Representative conventional approaches to reduce search time, such as CAM and in-memory databases, are no longer efficient because of the data explosion: CMOS-based CAM has low capacity which cannot be increased through CMOS scaling, and in-memory databases have performance degradation as data size increases. As a result, we have to exploit emerging nanotechnologies to accelerate search.
Among emerging nanotechnologies, memristors have become promising candidates to build storage structures because of high capacity, short switching time and low power consumption. However, the benefit we can obtain from these storage structures is limited by low endurance of memristors. In order to utilize the computation ability of memristors and deal with the endurance problem, we explore the design space of memristor-based storage structures.
We first propose MemCAM/MemTCAM, a configurable memristor-based CAM/TCAM design, in which we use memristors as both memory latches and logic gates. Computation ability of memristors makes it possible to perform range search and high density of memristors provides an opportunity to build MemCAM/MemTCAM with large capacity and small area. We use SPICE to model the memristor and analyze power and performance at different temperatures. The results show that it is feasible to build MemCAM and MemTCAM which have high capacity and can reduce total search time and energy consumption for search-intensive applications with huge data size.
We then propose four hybrid memristor-based storage structures, Hash-CAM, T-tree-CAM, TB+-tree, and TB+-tree-CAM, to solve the endurance problem. We use an analytical model to evaluate and compare the performance and lifetime of two software-implemented memory-based T-trees and these four hybrid storage structures. The results show that hybrid storage structures can utilize range search abilities, achieve better performance than memory-based T-trees, and improve lifetime from minutes to longer than 60 years. Furthermore, TB+-tree-CAM, a hybrid memristor-based storage structure combining T-tree, B+-tree and CAM, manages to balance between performance and lifetime and can outperform other storage structures when taking both performance and lifetime into consideration.
Item Open Access Assisting Unsupervised Optical Flow Estimation with External Information(2023) Yuan, ShuaiOptical flow estimation is a long-standing problem in computer vision with broad applications in autonomous driving, robotics, etc.. Due to the scarcity of ground-truth labels, the unsupervised estimation of optical flow is especially important. However, it is a poorly constrained problem and presents challenges in the presence of occlusions, motion boundaries, non-Lambertian surfaces, lack of texture, and illumination changes. Therefore, we explore using external information, namely partial labels, semantics, and stereo views, to assist unsupervised optical flow estimation.Supervised training of optical flow predictors generally yields better accuracy than unsupervised training. However, the improved performance comes at an often high annotation cost. Semi-supervised training trades off accuracy against annotation cost. We use a simple yet effective semi-supervised training method to show that even a small fraction of labels can improve flow accuracy by a significant margin over unsupervised training. In addition, we propose active learning methods based on simple heuristics to further reduce the number of labels required to achieve the same target accuracy. Our experiments on both synthetic and real optical flow datasets show that our semi-supervised networks generally need around 50% of the labels to achieve close to full-label accuracy, and only around 20% with active learning on Sintel. We also analyze and show insights on the factors that may influence active learning performance. Code is available at https://github.com/duke-vision/optical-flow-active-learning-release. Unsupervised optical flow estimation is especially hard near occlusions and motion boundaries and in low-texture regions. We show that additional information such as semantics and domain knowledge can help better constrain this problem. We introduce SemARFlow, an unsupervised optical flow network designed for autonomous driving data that takes estimated semantic segmentation masks as additional inputs. This additional information is injected into the encoder and into a learned upsampler that refines the flow output. In addition, a simple yet effective semantic augmentation module provides self-supervision when learning flow and its boundaries for vehicles, poles, and sky. Together, these injections of semantic information improve the KITTI-2015 optical flow test error rate from 11.80% to 8.38%. We also show visible improvements around object boundaries as well as a greater ability to generalize across datasets. Code is available at https://github.com/duke-vision/semantic-unsup-flow-release. Both optical flow and stereo disparities are image matches and can therefore benefit from joint training. Depth and 3D motion provide geometric rather than photometric information and can further improve optical flow. Accordingly, we design a first network that estimates flow and disparity jointly and is trained without supervision. A second network, trained with optical flow from the first as pseudo-labels, takes disparities from the first network, estimates 3D rigid motion at every pixel, and reconstructs optical flow again. A final stage fuses the outputs from the two networks. In contrast with previous methods that only consider camera motion, our method also estimates the rigid motions of dynamic objects, which are of key interest in applications. This leads to better optical flow with visibly more detailed occlusions and object boundaries as a result. Our unsupervised pipeline achieves 7.36% optical flow error on the KITTI-2015 benchmark and outperforms the previous state-of-the-art 9.38% by a wide margin. It also achieves slightly better or comparable stereo depth results. Code will be made available.
Item Open Access Attack Countermeasure Trees: A Non-state-space Approach Towards Analyzing Security and Finding Optimal Countermeasure Set(2010) Roy, ArpanAttack tree (AT) is one of the widely used non-statespace
models in security analysis. The basic formalism of AT
does not take into account defense mechanisms. Defense trees
(DTs) have been developed to investigate the effect of defense
mechanisms usinghg measures such as attack cost, security
investment cost, return on attack (ROA) and return on investment
(ROI). DT, however, places defense mechanisms only at the
leaf nodes and the corresponding ROI/ROA analysis does not
incorporate the probabilities of attack. In attack response tree
(ART), attack and response are both captured but ART suffers
from the problem of state-space explosion, since solution of
ART is obtained by means of a state space model. In this
paper, we present a novel attack tree paradigm called attack
countermeasure tree (ACT) which avoids the generation and
solution of the state-space model and takes into account attacks as
well as countermeasures (in the form of detection and mitigation
events). In ACT, detection and mitigation are allowed not just at
the leaf node but also at the intermediate nodes while at the same
time the state-space explosion problem is avoided in its analysis.
We use single and multiobjective optimization to find optimal
countermeasures under different constraints. We illustrate the
features of ACT using several case studies.
Item Open Access Autonomous Sensor Path Planning and Control for Active Information Gathering(2014) Lu, WenjieSensor path planning and control refer to the problems of determining the trajectory and feedback control law that best support sensing objectives, such as monitoring, detection, classification, and tracking. Many autonomous systems developed, for example, to conduct environmental monitoring, search-and-rescue operations, demining, or surveillance, consist of a mobile vehicle instrumented with a suite of proprioceptive and exteroceptive sensors characterized by a bounded field-of-view (FOV) and a performance that is highly dependent on target and environmental conditions and, thus, on the vehicle position and orientation relative to the target and the environment. As a result, the sensor performance can be significantly improved by planning the vehicle motion and attitude in concert with the measurement sequence. This dissertation develops a general and systematic approach for deriving information-driven path planning and control methods that maximize the expected utility of the sensor measurements subject to the vehicle kinodynamic constraints.
The approach is used to develop three path planning and control methods: the information potential method (IP) for integrated path planning and control, the optimized coverage planning based on the Dirichlet process-Gaussian process (DP-GP) expected Kullback-Leibler (KL) divergence, and the optimized visibility planning for simultaneous target tracking and localization. The IP method is demonstrated on a benchmark problem, referred to as treasure hunt, in which an active vision sensor is mounted on a mobile unicycle platform and is deployed to classify stationary targets characterized by discrete random variables, in an obstacle-populated environment. In the IP method, an artificial potential function is generated from the expected conditional mutual information of the targets and is used to design a closed-loop switched controller. The information potential is also used to construct an information roadmap for escaping local minima. Theoretical analysis shows that the closed-loop robotic system is asymptotically stable and that an escaping path can be found when the robotic sensor is trapped in a local minimum. Numerical simulation results show that this method outperforms rapidly-exploring random trees and classical potential methods. The optimized coverage planning method maximizes the DP-GP expected KL divergence approximated by Monte Carlo integration in order to optimize the information value of a vision sensor deployed to track and model multiple moving targets. The variance of the KL approximation error is proven to decrease linearly with the inverse of the number of samples. This approach is demonstrated through a camera-intruder problem, in which the camera pan, tilt, and zoom variables are controlled to model multiple moving targets with unknown kinematics by nonparametric DP-GP mixture models. Numerical simulations as well as physical experiments show that the optimized coverage planning approach outperforms other applicable algorithms, such as methods based on mutual information, rule-based systems, and randomized planning. The third approach developed in this dissertation, referred to as optimized visibility motion planning, uses the output of an extended Kalman filter (EKF) algorithm to optimize the simultaneous tracking and localization performance of a robot equipped with proprioceptive and exteroceptive sensors, that is deployed to track a moving target in a global positioning system (GPS) denied environment.
Because active sensors with multiple modes can be modeled as a switched hierarchical system, the sensor path planning problem can be viewed as a hybrid optimal control problem involving both discrete and continuous state and control variables. For example, several authors have shown that a sensor with multiple modalities is a switched hybrid system that can be modeled by a hierarchical control architecture with components of mission planning, trajectory planning, and robot control. Then, the sensor performance can be represented by two Lagrangian functions, one function of the discrete state and control variables, and one function of the continuous state and control variables. Because information value functions are typically nonlinear, this dissertation also presents an adaptive dynamic programming approach for the model-free control of nonlinear switched systems (hybrid ADP), which is capable of learning the optimal continuous and discrete controllers online. The hybrid ADP approach is based on new recursive relationships derived in this dissertation and is proven to converge to the solution of the hybrid optimal control problem. Simulation results show that the hybrid ADP approach is capable of converging to the optimal controllers by minimizing the cost-to-go online based on a fully observable state vector.
Item Open Access Bayesian Nonparametric Modeling of Latent Structures(2014) Xing, ZhengmingUnprecedented amount of data has been collected in diverse fields such as social network, infectious disease and political science in this information explosive era. The high dimensional, complex and heterogeneous data imposes tremendous challenges on traditional statistical models. Bayesian nonparametric methods address these challenges by providing models that can fit the data with growing complexity. In this thesis, we design novel Bayesian nonparametric models on dataset from three different fields, hyperspectral images analysis, infectious disease and voting behaviors.
First, we consider analysis of noisy and incomplete hyperspectral imagery, with the objective of removing the noise and inferring the missing data. The noise statistics may be wavelength-dependent, and the fraction of data missing (at random) may be substantial, including potentially entire bands, offering the potential to significantly reduce the quantity of data that need be measured. We achieve this objective by employing Bayesian dictionary learning model, considering two distinct means of imposing sparse dictionary usage and drawing the dictionary elements from a Gaussian process prior, imposing structure on the wavelength dependence of the dictionary elements.
Second, a Bayesian statistical model is developed for analysis of the time-evolving properties of infectious disease, with a particular focus on viruses. The model employs a latent semi-Markovian state process, and the state-transition statistics are driven by three terms: ($i$) a general time-evolving trend of the overall population, ($ii$) a semi-periodic term that accounts for effects caused by the days of the week, and ($iii$) a regression term that relates the probability of infection to covariates (here, specifically, to the Google Flu Trends data).
Third, extensive information on 3 million randomly sampled United States citizens is used to construct a statistical model of constituent preferences for each U.S. congressional district. This model is linked to the legislative voting record of the legislator from each district, yielding an integrated model for constituency data, legislative roll-call votes, and the text of the legislation. The model is used to examine the extent to which legislators' voting records are aligned with constituent preferences, and the implications of that alignment (or lack thereof) on subsequent election outcomes. The analysis is based on a Bayesian nonparametric formalism, with fast inference via a stochastic variational Bayesian analysis.
Item Open Access Characterizing and Mitigating Errors in Quantum Computers(2023) Majumder, SwarnadeepThis thesis aims to present methods for characterizing and mitigating errors in quantum computers. We begin by providing a historical overview of computing devices and the evolution of quantum information. The basics of characterizing noise in quantum computers and the utilization of quantum control and error mitigation techniques to reduce the impact of noise on performance are also discussed. In the initial part of the thesis, we focus on a particularly detrimental type of time-dependent errors and derive theoretical limits of a closed-loop feedback based quantum control protocol for their mitigation. Two different protocols, one suitable for fault-tolerant systems and another for near-term devices, are presented and their performance is demonstrated through numerical simulations. Additionally, we explore the mitigation of coherent noise at the circuit level through the use of the hidden inverses protocol with results from experiments conducted at Duke University, Sandia National Laboratories, and IBM. Finally, we propose a scalable error characterization procedure for large quantum systems, which is tested through numerical simulations to highlight its sensitivity to various sources of noise. Crucially, this protocol does not require access to ideal classical simulation of quantum circuits unlike other benchmarks such as quantum volume or cross entropy benchmarks.
Item Open Access Closed-Loop Deep Brain Stimulation in Parkinson’s Disease with Distributed, Proportional plus Integral Control(2022) Chowdhury, Afsana HoqueContinuous deep brain stimulation (cDBS) of either subthalamic nucleus (STN) or globus pallidus (GP) is an effective therapy in Parkinson’s Disease (PD) but is inherently limited by lack of responsiveness to dynamic, fluctuating symptoms intrinsic to the disease. Adaptive DBS (aDBS) adjusts stimulation in response to neural biomarkers to improve both efficacy and battery life. This thesis discusses 1) the development of dual target STN+GP aDBS with a novel, external adaptive controller and 2) the outcomes from a first in-human clinical trial in PD patients (n = 6; NCT #03815656) in order to assess efficacy of the aDBS controller.We performed random amplitude experiments to probe system dynamics and thus estimated initial aDBS parameters. We then implemented an innovative proportional plus integral (PI) aDBS using a novel distributed architecture. The PI aDBS controller was first evaluated in the clinic settings and then compared to cDBS in the home settings. The results showed that the PI aDBS control reduced average power delivered while preserving improved Unified Parkinson’s Disease Rating Scale (UPDRS) III scores in the clinic and reduced beta oscillations during blinded testing in the home setting. Thus, we demonstrated that the novel PI aDBS may enhance chronic, symptomatic treatment of PD.