Browsing by Subject "Neural network"
Results Per Page
Sort Options
Item Open Access Advances in Forces Fields for Small Molecules, Water and Proteins: from Polarization to Neural Network(2018) Wang, HaoMolecular dynamics (MD) simulations is an invaluable tool to investigate chemical and biological processes in atomic details. The accuracy of MD simulations strongly depends on underlying force fields. In conventional molecular mechanics (MM) force fields, the total energy is divided into bond energy, angle energy, dihedral energy, electrostatic interactions and van der Waals interactions. Each of these energy terms is parameterized by fitting to either experimental data or quantum mechanical (QM) calculations. In this dissertation, our aim is to develop accurate force fields for small molecules, water and proteins fully from QM calculations of small fragments. In the framework of conventional MM force fields, we calculated both transferable and molecule-specific atomic polarizabilities of small molecules by electrostatic potential fitting. Atomic polarizabilities are the key physical quantities in induced dipole polarization model. Molecular polarizabilities recovered from our atomic polarizabilities show good agreement with those obtained from QM calculations. We believe the main limitation of conventional MM force fields is the limited form of its Hamiltonian. Going beyond conventional MM force fields, we adopt the many-body expansion method and residue-based systematic molecular fragmentation (rSMF) method to start afresh building force fields for water and proteins, respectively. We used electrostatically embedded two-body expansion as the Hamiltonian of bulk water. QM reference of electrostatically embedded water monomer and dimer at the level of CCSD/aug-cc-pVDZ are parameterized by neural network (NN). Compared with experimental results, our water force fields show good structural and dynamical properties of bulk water. We developed rSMF to partition general proteins into twenty amino acid dipeptides and one peptide bond. The total energy of proteins is the combination of the energy of these small fragments. The QM reference energy of each fragment is parameterized by NN. Our protein force fields compare favorably with full QM calculations for both homogeneous and heterogeneous polypeptides in terms of energy and force errors.
Item Open Access Advancing Deep-Generated Speech and Defending against Its Misuse(2023) Cai, ZexinDeep learning has revolutionized speech generation, spanning synthesis areas such as text-to-speech and voice conversion, leading to diverse advancements. On the one hand, when trained on high-quality datasets, artificial voices now exhibit a level of synthesized quality that rivals human speech in naturalness. On the other, cutting-edge deep synthesis research is making strides in producing controllable systems, allowing for generating audio signals in arbitrary voice and speaking style.
Yet, despite their impressive synthesis capabilities, current speech generation systems still face challenges in controlling and manipulating speech attributes. Control over crucial attributes, such as speaker identity and language, essential for enhancing the functionality of a synthesis system, still needs to be improved. Specifically, systems capable of cloning a target speaker's voice in cross-lingual contexts or replicating unseen voices are still in their nascent stages. On the other hand, the heightened naturalness of synthesized speech has raised concerns, posing security threats to both humans and automated speech processing systems. The rise of accessible audio deepfakes, capable of spreading misinformation or bypassing biometric security, accentuates the complex interplay between advancing and defencing against deep-synthesized speech.
Consequently, this dissertation delves into the dynamics of deep-generated speech, viewing it from two perspectives. Offensively, we aim to enhance synthesis systems to elevate their capabilities. On the defensive side, we introduce methodologies to counter emerging audio deepfake threats, offering solutions grounded in detection-based approaches and reliable synthesis system design.
Our research yields several noteworthy findings and conclusions. First, we present an improved voice cloning method incorporated with our novel feedback speaker consistency mechanism. Second, we demonstrate the feasibility of achieving cross-lingual multi-speaker speech synthesis with a limited amount of bilingual data, offering a synthesis method capable of producing diverse audio across various speakers and languages. Third, our proposed frame-level detection model for partially fake audio attacks proves effective in detecting tampered utterances and locating the modified regions within. Lastly, by employing an invertible synthesis system, we can trace back to the original speaker of a converted utterance. Despite these strides, each domain of our study still confronts challenges, further fueling our motivation for persistent research and refinement of the associated performance.
Item Open Access GRADIENT DESCENT METHODS IN MODERN MACHINE LEARNING PROBLEMS: PROVABLE GUARANTEES(2023) Zhu, HanjingModern machine learning methods have demonstrated remarkable success in many indus- tries. For instance, the famous ChatGPT relies on a machine learning model trained with a substantial volume of text and conversation data. To achieve optimal model performance, an efficient optimization algorithm is essential for learning the model parameters. Among optimization methods, gradient descent (GD) methods are the simplest ones. Traditionally, GD methods have shown excellent performance in conventional machine learning problems with nice objective functions and simple training paradigms where computation occurs on a single server storing all the data. However, the understanding of how GD methods perform in modern machine learning problems with non-convex and non-smooth objective functions or more complex training paradigm remains limited.This thesis is dedicated to providing a theoretical understanding of why gradient descent methods excel in modern machine learning problems. In the first half of the thesis, we study stochastic gradient descent(SGD) in training multi-layer fully connected feedforward neural networks with Rectified Linear Unit (ReLU) activation. Since the loss function in training deep neural networks is non-convex and non-smooth, the standard convergence guarantees of GD for convex loss functions cannot be applied. Instead, through a kernel perspective, we demonstrate that when fresh data arrives in a stream, SGD ensures the exponential convergence of the average prediction error. In the second half, we investigate the utilization of GD methods in a new training paradigm, featuring a central parameter server (PS) and numerous clients storing data locally. Privacy constraints prevent the local data from being revealed to the PS, making this distributed learning setting particularly relevant in the current big-data era where data is often sensitive and too large to be stored on a single device. In practical applications, this distributed setting presents two major challenges: data heterogeneity and adversarial attacks. To overcome these challenges and achieve accurate estimates of model parameters, we propose a GD-based algorithm and provide convergence guarantees for both strongly convex and non-convex loss functions.
Item Open Access Improving Natural Language Understanding via Contrastive Learning Methods(2021) Cheng, PengyuNatural language understanding (NLU) is an essential but challenging task in Natural Language Processing (NLP), aiming to automatically extract and understand the semantic information from raw text or voice data. Among the previous NLU solutions, representation learning methods have recently become the mainstream, which maps textual data into low-dimensional vector spaces for downstream tasks. With the development of deep neural networks, text representation learning has achieved state-of-the-art performance on plenty of NLP scenarios.
Although text representation learning methods with large-scale network encoders have shown significant empirical gains, many essential properties of the text encoders remain unexplored, which hinders models' further application into real-world scenarios: (1) the high computational complexity of the large-scale deep networks limits text encoders to be applied on a broader range of devices, especially on low calculation-ability resources; (2) the mechanic of networks is agnostic, limiting the control of the latent representations for downstream tasks; (3) representation learning methods are data-driven, lead to inherent social bias problems with unbalanced data.
To address the problems above in deep text encoders, I proposed a series of effective contrastive learning methods, which supervise the encoders by enlarging the difference between positive and negative data sample pairs. In this thesis, I first present a theoretical contrastive learning tool, which bridges the contrastive learning methods and the mutual information in information theory. Then, I apply contrastive learning into several NLU scenarios to improve the text encoders' effectiveness, interpretability, and fairness.
Item Open Access Interpretable Machine Learning With Medical Applications(2023) Barnett, Alina JadeMachine learning algorithms are being adopted for clinical use, assisting with difficult medical tasks previously limited to highly-skilled professionals. AI (artificial intelligence) performance on isolated tasks regularly exceeds that of human clinicians, spurring excitement about AI's potential to radically change modern healthcare. However, there remain major concerns about the uninterpretable (i.e., "black box") nature of commonly-used models. Black box models are difficult to troubleshoot, cannot provide reasoning for their predictions, and lack accountability in real-world applications, leading to a lack of trust and low rate of adoption by clinicians. As a result, the European Union (through the General Data Protection Regulation) and the US Food & Drug Administration have published new requirements and guidelines calling for interpretability and explainability in AI used for medical applications.
My thesis addresses these issues by creating interpretable models for the key clinical decisions of lesion analysis in mammography (Chapters 2 and 3) and pattern identification in EEG monitoring (Chapter 4). To create models with comparable discriminative performance to their uninterpretable counterparts, I constrain neural network models using novel neural network architectures, objective functions and training regimes. The resultant models are inherently interpretable, providing explanations for each prediction that faithfully represent the underlying decision-making of the model. These models are more than just decision makers; they are decision aids capable of explaining their predictions in a way medical practitioners can readily comprehend. This human-centered approach allows a clinician to inspect the reasoning of an AI model, empowering users to better calibrate their trust in its predictions and overrule it when necessary
Item Open Access Real-time Target Tracking in Fluoroscopy Imaging using Unet with Convolutional LSTM(2020) Peng, TengyaTarget localization precision is crucial for the treatment outcome of radiation therapy. In lung stereotatic body radiation therapy (SBRT), verifying target motion in the real time 2D fluoro images is often used as a vital tool to ensure adequate coverage of the target volume before the treatment delivery starts. However, accurate target localization in 2D fluoroscopy images is very challenging due to the overlapping anatomical structures in the projection images. The localization is often visually performed by physicians and physicists, which is a subjective process that depends on the experience of the clinician. In this paper, we have developed a deep learning network for automatic target localization to improve the efficiency and robustness of the process. Specifically, the deep learning network adopts a Unet architecture with a coarse-to-fine structure. In addition, we innovatively incorporate convolutional Long Short-Term Memory (LSTM) layer into the network to utilize the time correlation between the fluoro images. A Generative Adversarial method was used to train the network to further improve its localization accuracy. A hybrid loss was used to improve the feature learning during the training. The model was tested on a large amount of data generated by the digital X-CAT phantom. Various patient sizes, respiratory amplitudes, and tumor sizes and locations were simulated in the X-CAT phantoms to test the accuracy and robustness of the method. Our model has been proved with great accuracy not only on massive samples but also on specific set of samples. On massive samples, our model achieves IOU 0.92 and centroid of mass difference 0.16 and 0.07 cm in vertical and horizontal direction. On unique set of samples, the IOU is even higher to be 0.98. The centroid of mass difference could be amazingly 0.03 and 0.007 cm. In summary, our results demonstrated the feasibility of using this deep learning network for real target tracking in fluoro images, which will be crucial for target verification before or during lung SBRT treatments.
Item Open Access Realtime Image Processing for Resource Constrained Devices(2018) Streiffer, ChristopherWith the proliferation of embedded sensors within smartphone and Internet-of-Things devices, applications have programmatic access to more data processing than ever before. At the same time, advances in computer vision and deep learning have fostered methodology for performing complex, yet powerful operations on spatial and temporal data. Capitalizing on this union, applications are capable of providing advanced functionality to their users through features such as augmented reality and image classification. However, the devices responsible for running these libraries often lack the sufficient hardware to replicate the parallelization and straight-line speed of high-end servers. For image processing applications, this means that realtime performance is difficult without compromising functionality.
To detail this emerging paradigm, this work presents and examines two image processing applications which offer advanced functionality. The first, DarNet, utilizes the TensorFlow library to perform distracted driving classification based on image data using a Convolutional Neural Network (CNN). The second, PrivateEye, uses the OpenCV library to provide a camera based access-control privacy framework for Android users. While this advanced processing allows for enhanced functionality, the computationally expensive operations impose limitations on the realtime performance of these applications due to the lack of sufficient hardware.
This work posits that realtime image processing applications running on resource constrained devices require the external use of edge servers. To this extent, this work presents ePrivateEye, an extension to PrivateEye which provides code offloading to an edge server. The results of this work shows that offloading video-frame analysis to the edge at a metro-scale distance allows ePrivateEye to analyze more frames than PrivateEye's local processing over the same period, and achieve realtime performance of 30 fps with perfect precision and negligible impact on energy efficiency.
Item Open Access Towards Efficient and Robust Deep Neural Network Models(2022) Yang, HuanruiRecently, deep neural network (DNN) models have shown beyond-human performance in multiple tasks. However, DNN models still exhibit outstanding issues on efficiency and robustness that hinder their applications in the real world. For efficiency, modern DNN architectures often contain millions of parameters and require billions of operations to process a single input, making it hard to deploy these models on mobile and edge devices. For robustness, recent research on adversarial attack shows that most DNN models can be misled by tiny perturbations added on the input, leaving doubts on the robustness of DNNs in security-related tasks. To tackle these challenges, this dissertation aims to advance and incorporate techniques from both fields of DNN efficiency and robustness, leading towards efficient and robust DNN models.
My research first advances model compression techniques including pruning, low-rank decomposition, and quantization to push the boundary of efficiency-accuracy tradeoff in DNN models. For pruning, I propose DeepHoyer, a new sparsity-inducing regularizer that is both scale-invariant and differentiable. For decomposition, I apply the sparsity-inducing regularizer on the decomposed singular values of DNN layers, together with an orthogonality regularization on the singular vectors. For quantization, I propose BSQ to achieve optimal mixed-precision quantization scheme by exploring bit-level sparsity, mitigating the costly search through the large design space of quantization precision. All these works successfully achieve DNN models that are both more accurate and more efficient than state-of-the-art methods. For robustness improvement, I change the previously undesired accuracy-robustness tradeoff of a single DNN model into an efficiency-robustness tradeoff of a DNN ensemble, without hurting the clean accuracy. The method, DVERGE, combines a vulnerability diversification objective and previously investigated model compression techniques, leading to an efficient ensemble whose robustness increases with the number of sub-models. Finally, I propose to unify the pursuit of accuracy and efficiency as an optimization towards robustness against weight perturbation. Thus, I introduce Hessian-Enhanced Robust Optimization to achieve highly accurate model that are robust to post-training quantization. The accomplish of my dissertation research paves way towards controlling the tradeoff between accuracy, efficiency and robustness, and leads to efficient and robust DNN models.