Browsing by Subject "Neural networks"
Results Per Page
Sort Options
Item Open Access Cloud-Based Remote Sensing for Conservation(2023-04) SlaughtThis project aims to develop web-based landcover classification tools for Virunga National Park in Democratic Republic of the Congo, leveraging the rich information provided by Sentinel-2 multi-spectral imagery. The tool will enable researchers, park managers, and other stakeholders to analyze land cover changes, identify potential threats, and develop targeted conservation strategies. However, working with multi-spectral imagery in tropical regions like Central Africa poses significant challenges due to persistent cloud cover. Hence, developing effective cloud detection systems is a prerequisite for obtaining reliable analysis-ready imagery. These detection systems must be able to distinguish between clouds and other similar features like bare soil and bright urban features, while also accounting for the spatial and temporal variability of cloud cover in the region. The tools that were developed integrate cloud detection algorithms and image processing techniques to deliver accurate, high-quality imagery. Additionally, the tool employs machine learning and deep learning techniques to perform automatic land cover classification and provide users with an intuitive map-driven interface. This web-based remote landcover classification tool provides park managers and researchers in Virunga, as well as other Congolese national parks, with a powerful platform for analyzing land cover changes, helping to support conservation efforts and promote sustainable land use practices.Item Open Access Data-Driven Analysis of Zebra Finch Song Copying and Learning(2021) Brudner, Samuel NavickasChildren learn crucial skills like speech by imitating the behavior of skilled adults. Similarly, juvenile zebra finches learn to sing by learning to imitate adults. This song learning process enables laboratory study of juvenile imitative learning. But it also poses behavioral quantification challenges. Zebra finches produce hundreds of thousands of complex vocalizations during vocal development. These undergo learned changes with respect to acoustic features that are relevant to the animal but experimentally unknown \textit{a priori}. Recent developments in machine learning provide tools to reduce the dimensionality of complex behaviors, plausibly simplifying this inference challenge. These tools have not been validated on or applied to song learning problems.
Here, I validate the use of an autoencoder to extract copying-relevant features from zebra finch song. Then, I develop tools to quantify developmental song change with respect to extracted features. In particular, I generate forward models that quantify developmental changes in syllable acoustic distributions. I also develop a method to score syllable maturity on a rendition-by-rendition basis. Both these techniques reveal circadian behavioral patterns that differ between normally developing and untutored juveniles, suggesting that tutoring not only sets target song acoustics; it directly affects intrinsic features of practice behavior. Critically, these tools enable making concrete predictions from otherwise abstract song learning theories.
Item Open Access Deep Learning for Applications in Inverse Modeling, Legislator Analysis, and Computer Vision for Security(2023) Spell, Gregory PaulTo judiciously use machine learning – particularly deep learning – requires identifying how to extract features from data and effectively leveraging those features to make predictions. This dissertation concerns deep learning methods for three applications: inverse modeling, legislator analysis, and computer vision for security. To address inverse problems, we present a new method, the Mixture Manifold Network, which uses multiple neural backward models in a forward-backward architecture. We experimentally demonstrate that the Mixture Manifold Network performs better than computationally fast generative model baselines, while performance approaching that of computationally slow iterative methods. For legislator modeling, we seek to learn representations that capture legislator attitudes that may not be contained in their voting records. We present a model that instead considers their tweeting behavior, and we use reactions to former President Donald Trump on Twitter as an illustrative example. For computer vision, we address two security-related applications using deep convolutional feature extractors. In the first of these, we leverage domain adaptation with deep object detection for threatening items – such as guns, knives, and blunt objects – in X-ray scans of air passenger luggage. In the second, we apply an occlusion-robust classifier to infrared imagery. For each application above, we describe the datasets for the problem, how the presented methods extract features from that data, and how efficacious predictions are produced from each of our proposed models
Item Open Access Indirect Training Algorithms for Spiking Neural Networks Controlled Virtual Insect Navigation(2015) Zhang, XuEven though Articial Neural Networks have been shown capable of solving many problems such as pattern recognition, classication, function approximation, clinics, robotics, they suers intrinsic limitations, mainly for processing large amounts of data or for fast adaptation to a changing environment. Several characteristics, such as iterative learning algorithms or articially designed neuron model and network architecture, are strongly restrictive compared with biological processing in natural neural networks. Spiking neural networks as the newest generation of neural models can overcome the weaknesses of ANNs. Because of the biologically realistic properties, the electrophysiological recordings of neural circuits can be compared to the outputs of the corresponding spiking neural network simulated on the computer, determining the plausibility of the starting hypothesis. Comparing with ANN, it is known that any function that can be computed by a sigmoidal neural network can also be computed by a small network of spiking neurons. In addition, for processing a large amount of data, SNNs can transmit and receive a large amount of data through the timing of the spikes and remarkably decrease the interactions load between neurons. This makes possible for very ecient parallel implementations.
Many training algorithms have been proposed for SNN training mainly based on the direct update of the synaptic plasticities or weights. However, the weights can not be changed directly and, instead, can be changed by the interactions of pre- and postsynaptic neural activities in many potential applications of adaptive spiking neural networks, including neuroprosthetic devices and CMOS/memristor nanoscale neuromorphic chips. The eciency of the bio-inspired, neuromorphic processing exposes the shortcomings of digital computing. After trained, the simulated neuromorphic model can be applied to speaker recognition, looming detection and temporal pattern matching. The properties of the neuromorphic chip enable it to solve the same problem while using fewer energies comparing with other hardware. The neuromorphic chips need applicable training methods that do not require direct manipulations of the connection strength.
Nowadays, thanks to fast improvements in hardware for neural stimulation and recording technologies, neurons in vivo and vitro can be controlled to re precisely in milliseconds. These improvements enable the study on the link between synaptic level and functional-level plasticity in the brain. However, existing training methods rely on learning rules for manipulating synaptic weights and on detailed knowledge of the network connectivity and synaptic strengths. New training algorithms that do not require the knowledge of the synaptic weights or connections are needed while they cannot require direct manipulations of the synaptic strength.
This thesis presents indirect training methods to train spiking neural networks,
which can both modeling neuromorphic chips and biological neural networks in vivo, via extra stimulus without the knowledge of synaptic strengths and connections. The algorithms are based on the spike timing-dependent plasticity rule by controlling input spike trains. One of the algorithms minimizes the error between the synaptic weight and the optimal weight, by stimulating the input neuron with an adaptive pulse training determined by the gradient of the error function. Another algorithm uses numerical gradient of the output error with respect to the weight change to control the training stimulus, which are injected to the neural network for controlling a virtual insect for navigating and nding target in an unknown terrain. Finally, the newest algorithm uses indirect perturbation of the temporal dierences between the extra stimulus in order to train a large spiking neural network. The trained spiking neural network can control both a unicycle modeled virtual insect and a virtual insect
moving in a tripod gait. The results show that these indirect training algorithms can train SNNs for solving control problems. In the thesis, the trained insect can and its target while avoiding obstacles in an unknown terrain. Future studies will focus on improving the insect's movement to using more complex locomotion model. The training algorithms will also be applied to biological neural networks and CMOS memristors. The trained neural networks will also be used for controlling flying microrobots.
Item Open Access Machine Learning for Uncertainty with Application to Causal Inference(2022) Zhou, TianhuiEffective decision making requires understanding the uncertainty inherent in a problem. This covers a wide scope in statistics, from deriving an estimator to training a predictive model. In this thesis, I will spend three chapters discussing new uncertainty methods developed for solving individual and population level inference problems with their theory and applications in causal inference. I will also detail the limitations of existing approaches and why my proposed methods lead to better performance.
In the first chapter, I will introduce a novel approach, Collaborating Networks (CN), to capture predictive distributions in regression. It defines two neural networks with two distinct loss functions to approximate the cumulative distribution function and its inverse respectively and collectively. This gives CN extra flexibility through bypassing the necessity of assuming an explicit distribution family like Gaussian. Empirically, CN generates sharp intervals with reliable coverage.
In the second chapter, I extend CN to estimate the individual treatment effect in observational studies. It is augmented by a new adjustment scheme developed through representation learning, which is shown to effectively alleviate the imbalance between treatment groups. Moreover, a new evaluation criterion is suggested by combing the estimated uncertainty and variation in utility functions (e.g., variability in risk tolerance) for more comprehensive decision making, while traditional approaches only study an individual’s outcome change due to a potential treatment.
In the last chapter, I will present an analysis pipeline for causal inference with propensity score weighting. Comparing to other pipelines for similar purposes, this package comprises a wider range of functionalities to provide an exhaustive design and analysis platform that enables users to construct different estimators and assess their uncertainties. Itoffers six major advantages: it incorporates (i) visualization and diagnostic tools of checking covariate overlap and balance, (ii) a general class of balancing weights, (iii) comparison for multiple treatments, (iv) simple and augmented (doubly-robust) weighting estimators, (iv) nuisance-adjusted sandwich variances, and (v) ratio estimands for binary and count outcomes.
Item Open Access MODEL-BASED LEARNING AND CONTROL OF ADVECTION-DIFFUSION TRANSPORT USING MOBILE ROBOTS(2019) Khodayi-mehr, RezaMathematical models that describe different processes and phenomena are of paramount importance in many robotics applications. Nevertheless, utilization of high-fidelity models, particularly Partial Differential Equations (PDEs), has been hindered for many years due to the lack of adequate computational resources onboard mobile robots. One such problem of interest for the roboticists, that can hugely benefit from more descriptive models, is Chemical Plume Tracing (CPT). In the CPT problem, one or multiple mobile robots are equipped with chemical concentration and flow sensors and attempt to localize chemical sources in an environment of interest. This problem has important applications ranging from environmental monitoring and protection to search and rescue missions. The transport of a chemical in a fluid medium is mathematically modeled by the Advection-Diffusion (AD) Partial Differential Equation (PDE). Despite versatility, rigorous derivation, and powerful descriptive nature, the AD-PDE has seldom been used in its general form for the solution of the CPT problem due to high computational cost. Instead, often simplified scenarios that render closed-form solutions for the AD-PDE or various heuristics are used in the robotics literature.
Using the AD-PDE to model the transport phenomenon enables generalization of the CPT problem to estimate other properties of the sources, e.g., their intensity, in addition to their locations. We refer to this problem as Source Identification (SI) which we define as the problem of estimating the properties of the sources using concentration measurements that are generated under the action of those sources. We can also put one step further and consider the problem of controlling a set of sources, carried by a team of mobile robots, to generate and maintain desired concentration levels in select regions of the environment with the objective of cloaking those regions from external environmental conditions; we refer to this problem as the AD-PDE control problem that has important applications in search and rescue missions.
Both SI and AD-PDE control problems can be formulated as PDE-constrained optimization problems. Solving such optimization problems onboard mobile robots is challenging due to the following reasons: (i) the computational cost of solving the AD-PDE using traditional numerical discretization schemes, e.g., the Finite Element (FE) method, is prohibitively high, (ii) obtaining accurate knowledge of the environment and Boundary and Initial Conditions (BICs), required to solve the AD-PDE, is difficult and prone to error and finally, (iii) obtaining accurate estimates of the velocity and diffusivity fields is challenging since for typical transport mediums like air even in very small velocities, the flow is turbulent. In addition, we need to plan the actions of the mobile robots, e.g., measurement collection for SI or release rates for the AD-PDE control problem, to ensure that they accomplish their tasks optimally. This can be done by formulating a planning problem that often is solved online to take into account the latest information that becomes available to robots. Solving this planning problem by itself is a challenging task that has been the subject of heavy research in the robotics literature. The reason is that (i) the objective is often nonlinear, (ii) the planning is preferred to be done for more than the immediate action to avoid myopic, suboptimal plans, and (iii) the environment that the robots operate in is often non-convex and cluttered with obstacles.
In order to address the computational challenges that rise due to the use of numerical schemes, we propose using multiple mobile robots that decompose the high-dimensional optimization variables among themselves or using nonlinear representations of the sources. In addition we utilize Model Order Reduction (MOR) approaches that facilitate the evaluation of the AD-DPE at the expense of accuracy. In order to alleviate the loss of accuracy, we also propose a novel MOR method using Neural Networks that can straight-forwardly replace the traditional MOR methods in our formulations. To deal with uncertainty in the the PDE input-data, i.e., the geometry of environment, BICs, and the velocity and diffusivity fields, we formulate a stochastic version of the SI problem that provides posterior probabilities over all possible values of these uncertain parameters. Finally, to obtain the velocity and corresponding diffusivity fields that are required for the solution of the AD-PDE, we rely on Bayesian inference to incorporate empirical measurements, collected and analyzed by mobile robots, into the numerical solutions obtained from computational fluid dynamics models.
In order to demonstrate the applicability of our proposed model-based approaches, we have devised and constructed an experimental setup and designed a mobile robot equipped with concentration and flow sensors. To the best of our knowledge, this dissertation is the first work to use the AD-PDE, in its general form, to solve realistic problems onboard mobile robots. Note that although here we focus on the AD-PDE and particularly chemical propagation, many other transport phenomena including heat and acoustic transport can be considered and the same principles apply. Our results are a proof of concept that we hope will convince many roboticists to use more general mathematical models in their solutions.
Item Embargo Modeling and Optimization of Emerging Technology-Based Artificial Intelligence Accelerators under Imperfections(2022) Banerjee, SanmitraMachine learning algorithms are emerging in a wide range of application domains, ranging from autonomous driving, real-time speech translation, and network anomaly detection to pandemic growth and trend prediction. In particular, deep learning, facilitated by highly parallelized processing in hardware accelerators, has received tremendous interest due to its effectiveness for solving complex tasks across different application domains. However, as Moore's law approaches its end, contemporary electronic deep-learning inferencing accelerators show diminishing energy efficiency and have been unable to cope with the performance demands from emerging deep learning applications. To mitigate these issues, there is a need for research efforts on emerging artificial intelligence (AI) accelerators that explore novel transistor technologies with high transconductance at the nanometer technology nodes and low-latency alternatives to metallic interconnects. In this dissertation, we focus on the modeling and optimization of two such technologies: (i) high-speed transistors built using carbon nanotubes (CNTs), and (ii) integrated photonic networks that parallelize matrix-vector multiplications.
CNTs are considered to be leading candidates for realizing beyond-silicon transistors. Owing to the ultra-thin body of CNTs and near-ballistic carrier transport, carbon nanotube field-effect transistors (CNFETs) demonstrate a high on-current/off-current ratio and low subthreshold swing. Integrated circuits (ICs) fabricated from CNFETs are projected to achieve an order of magnitude improvement in the energy-delay product compared to silicon MOSFETs. Despite these advances, several challenges related to yield and performance must be addressed before CNFET-based high-volume production can appear on industry roadmaps. While some of these challenges (e.g., shorts due to metallic CNTs and incorrect logic functionality due to misaligned CNTs) have been addressed, the impact of fabrication process variations and manufacturing defects has largely remained unexplored.
Silicon photonic networks have been known to outperform the existing communication infrastructure (i.e., metallic interconnect) in multi-processor systems-on-chip. In recent years, their application as compute platforms in AI accelerators has attracted considerable attention. Leveraging the inherent parallelism of optical computing, integrated photonic neural networks (IPNNs) can perform the otherwise time-intensive matrix multiplication in O(1) time. Given their competitive integration density, ultra-high energy efficiency, and good CMOS compatibility, IPNNs demonstrate order-of-magnitude higher performance and efficiency than their electronic counterparts. However, the performance of photonic components is highly sensitive to fabrication process variations, manufacturing defects, and crosstalk noise.
In this dissertation, we present the first comprehensive characterization of CNFETs and IPNNs under imperfections. In the case of CNFETs, we consider the impact of fabrication process variations in different device parameters and manufacturing defects that are commonly observed during fabrication. To characterize IPNNs, we consider uncertainties in phase angles and splitting ratios in their building blocks (i.e., Mach--Zehnder interferometers), non-uniform optical loss in the waveguides, and quantization errors due to low-precision encoding of tuned phase angles. Using detailed simulations, we show that these devices can deviate significantly from their nominal performance, even in mature fabrication processes. For example, we show that more than 90% CNFETs can fail due to a 5% change in the CNT diameter. Similarly, the inferencing accuracy of IPNNs can drop below 10% due to uncertainties in the phase angles and splitting ratios.
To ensure the adoption of accelerators based on CNFETs and IPNNs, techniques to test and mitigate the catastrophic impact of imperfections are necessary. As the nature of imperfection in CNFETs vary significantly from those in Si-MOSFETs, existing commercial test pattern generation tools are inefficient when they are applied to ICs with imperfect CNFETs. This thesis presents VADF, a novel CNFET variation-aware test pattern generation tool that significantly improves the efficiency of small delay defect testing under imperfections. Unetched CNTs in the active layer can lead to parasitic FETs that can cause resistive shorts. In addition, we propose ParaMitE, which is a low-cost optimization technique, to reduce the probability of para-FET occurrence and mitigate their impact on performance. The thesis also describes three optimization techniques to improve the power-efficiency and reliability of IPNNs under imperfections. OptimSVD leverages non-uniqueness of the singular value decomposition to minimize the phase angles in an IPNN while guaranteeing zero accuracy loss. We propose CHAMP and LTPrune, which, to the best of our knowledge, are the only photonic hardware-aware magnitude pruning techniques targeted towards IPNNs.
In summary, this dissertation tackles important problems related to the reliability and high-volume yield of next-generation AI accelerators. We show how the criticality of different imperfections can change based on their magnitude and also the location and parameters of the affected components. The methods presented in this dissertation, while targeted towards CNFETs and IPNNs, can be easily extended towards other emerging technologies leveraged for AI hardware. The insights derived from this work can help designers to develop post-silicon AI accelerators that, in addition to demonstrating superior nominal performance, are resilient to inevitable imperfections.
Item Open Access Noisefield Estimation, Array Calibration and Buried Threat Detection in the Presence of Diffuse Noise(2019) Bjornstad, Joel NilsOne issue associated with all aspects of the signal processing and decision making fields is that signals of interest are corrupted by noise. This work specifically considers scenarios where the primary noise source is external to an array of receivers and is diffuse. Spatially diffuse noise is considered in three scenarios: noisefield estimation, array calibration using diffuse noise as a source of opportunity, and detection of buried threats using Ground Penetrating Radar (GPR).
Modeling the ocean acoustic noise field is impractical as the noise seen by a receiver is dependent on the position of distant shipping (a major contributing source of low frequency noise) as well as the temperature, pressure, salinity and bathymetry of the ocean. Measuring the noise field using a standard towed array is also not practical due the inability of a line array to distinguish signals arriving at different elevations as well the presence of the well-known left/right ambiguity. A method to estimate the noisefield by fusing data from a traditional towed array and two small-aperture planar arrays is developed. The resulting noise field estimates can be used to produce synthetic covariance matrices that exhibit parity performance with measured covariance matrices when used in a Matched Subspace Detector.
For a phased array to function effectively, the positions of the array elements must be well calibrated. Previous efforts in the literature have primarily focused on use of discrete sources for calibration. The approach taken here focuses on using spatially oversampled, overlapping sub-arrays. The distance between elements is determine using The geometry of each individual sub-array is determined using Maximum Likelihood estimates of the interelement distances and determining the geometry of each sub array using Multidimensional Scaling. The overlapping sub-arrays are then combined into a single array. The algorithm developed in this work performs well in simulation. Limitations in the experimental setup preclude drawing firm conclusions based on an in-air test of the algorithm.
Ground penetrating radar (GPR) is one of the most successful methods to detect landmines and other buried threats. GPR images, however, are very noisy as the propagation path through soil is quite complex. It is a challenging problem to classify GPR images as threats or non-threats. Successful buried threat classification algorithm rely on a handcrafted feature descriptor paired with a machine learning classifier. In this work the state-of-the-art Spatial Edge Descriptor (SED) feature was implemented as a neural network. This implementation allows the feature and the classifier to be trained simultaneously and expanded with minimal intervention from a designer. Impediments to training this novel network were identified and a modified network proposed that surpasses the performance of the baseline SED algorithm.
These cases demonstrate the practicality of mitigating or using diffuse background noise to achieve desired engineering results.
Item Open Access Robustness Analysis and Improvement in Neural Networks and Neuromorphic Computing(2021) Song, ChangDeep learning and neural networks have great potential while still at risk. The so-called adversarial attacks, which apply small perturbations on input samples to fool models, threaten the reliability of neural networks and their hardware counterparts, neuromorphic computing. To solve such issues, various attempts are made, including adversarial training and other data augmentation methods.In our early attempt to defend adversarial attacks, we propose a multi-strength adversarial training method to cover a wider effective range than typical single-strength adversarial training. Furthermore, we also propose two different structures in order to compensate for the tradeoff between the total training time and the hardware implementation cost. Experimental results show that our proposed method gives better accuracy than the baselines with tolerable additional hardware cost. To better understand robustness, we analyze the adversarial problem in the decision space. In one of our defense approaches called feedback learning, we theoretically prove the effectiveness of adversarial training and other data augmentation method. For empirical proof, we generate non-adversarial examples based on the information of the decision boundaries of neural networks and add these examples in training. The results show that the boundaries of the models are more robust to noises and perturbations after applying feedback learning than baselines. Besides algorithm-level concerns, we also focus on hardware implementations in quantization scenarios. We find that adversarially-trained neural networks are more vulnerable to quantization loss than plain models. To improve the robustness of hardware-based quantized models, we explore methods such as feedback learning, nonlinear mapping, and layer-wise quantization. Results show that the adversarial and quantization robustness can be improved by feedback learning and nonlinear mapping, respectively. But the accuracy gap introduced by quantization can be further minimized. To minimize both losses simultaneously, we also propose a layer-wise adversarial-aware quantization method to choose the best quantization parameter settings for adversarially-trained models. In this method, we use the Lipschitz constant of different layers as error sensitivity metrics and design several criteria to decide the quantization settings for each layer. The results show that our method can further minimize the accuracy gap between full-precision and quantized adversarially-trained models.
Item Open Access The Art of Artificial Neural Networks: Emergent Creativity in Artificial Intelligence(2018) Rajkumar, Vijay GauthamThis M.A. thesis in Computational Media is an exploration of how artificial neural networks could be used by an artist; it is a transdisciplinary study of topics in computer science, light art and installation art. The study culminates in the design and production of two installations that attempt to respond to the question: how does one draw on high-level Artificial Intelligence algorithms and human aesthetic intuition informed by contemporary and historical research of art history, to generate a new hybrid art form?
Item Open Access The Neural Basis of Involuntary Episodic Memories(2016) Hall, Shana AlexandraInvoluntary episodic memories are memories that come into consciousness without preceding retrieval effort. These memories are commonplace and are relevant to multiple mental disorders. However, they are vastly understudied. We use a novel paradigm to elicit involuntary memories in the laboratory so that we can study their neural basis. In session one, an encoding session, sounds are presented with picture pairs or alone. In session two, in the scanner, sounds-picture pairs and unpaired sounds are reencoded. Immediately following, participants are split into two groups: a voluntary and an involuntary group. Both groups perform a sound localization task in which they hear the sounds and indicate the side from which they are coming. The voluntary group additionally tries to remember the pictures that were paired with the sounds. Looking at neural activity, we find a main effect of condition (paired vs. unpaired sounds) showing similar activity in both groups for voluntary and involuntary memories in regions typically associated with retrieval. There is also a main effect of group (voluntary vs. involuntary) in the dorsolateral prefrontal cortex, a region typically associated with cognitive control. Turning to connectivity similarities and differences between groups again, there is a main effect of condition showing paired > unpaired sounds are associated with a recollection network. In addition, three group differences were found: (1) increased connectivity between the pulvinar nucleus of the thalamus and the recollection network for the voluntary group, (2) a higher association between the voluntary group and a network that includes regions typically found in frontoparietal and cingulo-opercular networks, and (3) shorter path length for about half of the nodes in these networks for the voluntary group. Finally, we use the same paradigm to compare involuntary memories in people with posttraumatic stress disorder (PTSD) to trauma-controls. This study also included the addition of emotional pictures. There were two main findings. (1) A similar pattern of activity was found for paired > unpaired sounds for both groups but this activity was delayed in the PTSD group. (2) A similar pattern of activity was found for high > low emotion stimuli but it occurred early in the PTSD group compared to the control group. Our results suggest that involuntary and voluntary memories share the same neural representation but that voluntary memories are associated with additional cognitive control processes. They also suggest that disorders associated with cognitive deficits, like PTSD, can affect the processing of involuntary memories.
Item Open Access Theoretical Understanding of Neural Network Optimization Landscape and Self-Supervised Representation Learning(2023) Wu, ChenweiNeural networks have achieved remarkable empirical success in various areas. One key factor of their success is their ability to automatically learn useful representations from data. Self-supervised representation learning, which learns the representations during pre-training and applies learned representations in downstream tasks, has become the dominant approach for representation learning in recent years. However, theoretical understanding of self-supervised representation learning is scarce. Two main bottlenecks in understanding self-supervised representation learning are the big differences between pre-training and downstream tasks and the difficulties in neural network optimization. In this thesis, we present an initial exploration into analyzing the benefit of pre-training in self-supervised representation learning and two heuristics in neural network optimization.
The first part of this thesis presents our attempts to understand why the representations produced by pre-trained models are useful in downstream tasks. We assume we can optimize the training objective well in this part. For the over-realized sparse coding model with noise, we show that the masking objective used in pre-training ensures the recovery of ground-truth model parameters. For a more complicated log-linear word model, we characterize what downstream tasks can benefit from the learned representations in pre-training. Our experiments validate these theoretical results.
The second part of this thesis provides explanations about two important phenomena in the neural network optimization landscape. We first propose and rigorously prove a novel conjecture that explains the low-rank structure of the layer-wise neural network Hessian. Our conjecture is verified experimentally and can be used to tighten generalization bounds for neural networks. We also study the training stability and generalization problem in the learning-to-learn framework where machine learning algorithms are used to learn parameters for training neural networks. We rigorously proved our conjectures in simple models and empirically verified our theoretical results in our experiments with practical neural networks and real data.
Our results provide theoretical understanding of the benefits of pre-training for downstream tasks and two important heuristics of neural network optimization landscape. We hope these insights could further improve the performance of self-supervised representation learning approaches and inspire the design of new algorithms.
Item Open Access Towards Better Representations with Deep/Bayesian Learning(2018) Li, ChunyuanDeep learning and Bayesian Learning are two popular research topics in machine learning. They provide the flexible representations in the complementary manner. Therefore, it is desirable to take the best from both fields. This thesis focuses on the intersection of the two topics— enriching one with each other. Two new research topics are inspired: Bayesian deep learning and Deep Bayesian learning.
In Bayesian deep learning, scalable Bayesian methods are proposed to learn the weight uncertainty of deep neural networks (DNNs). On this topic, I propose the preconditioned stochastic gradient MCMC methods, then show its connection to Dropout, and its applications to modern network architectures in computer vision and natural language processing.
In Deep Bayesian learning: DNNs are employed as powerful representations of conditionals in traditional Bayesian models. I will focus on understanding the recent adversarial learning methods for joint distribution matching, through which several recent bivariate adversarial models are unified. It further raises the non-identifiability issues in bidirectional adversarial learning, and propose ALICE algorithms: a conditional entropy framework to remedy the issues. The derived algorithms show significant improvement in the tasks of image generation and translation, by solving the non-identifiability issues.