Permanent URI for this collection

Duke migrated to an electronic-only system for dissertations between 2006 and 2010. As such, dissertations completed between 2006 and 2010 may not be part of this system, and those completed before 2006 are not hosted here except for a small number that have been digitized. For access to dissertations created prior to 2006 and those not submitted electronically, please see: https://library.duke.edu/find/theses-dissertations.

Policies & procedures governing electronic theses & dissertations can be found on the graduate school website.


Recent Submissions

Now showing 1 - 20 of 5866
  • ItemEmbargo
    Deep Learning Algorithms for Automating and Accelerating the Cryo-EM Data Processing Pipeline
    (2023) Huang, Qinwen

    Cryo-electron microscopy (cryo-EM) has solidified its position in the structural biology field as an invaluable method for achieving near-atomic resolution of macro-molecular structures in their native conditions. However, the inherently fragile nature of biological samples imposes stringent limitations on the electron doses that can be used during imaging, resulting in data characterized by notably low signal-to-noise ratios (SNR). To obtain a three-dimensional (3D) representation of these biological entities, substantial volumes of data need to be acquired and averaged in 3D to remove noise and improve resolution. The cryo-EM structure determination workflow involves many intricate steps, starting with sample preparation and vitrification, progressing to sample screening and data collection. During data analysis, macromolecular structures-of-interest need to be accurately identified and localized before they can be used for 3D reconstruction. A key challenge in this process is the extensive manual intervention and time required to analyze the large volumes of data that are necessary to achieve high-resolution. In this thesis, we propose strategies that harness the capabilities of deep learning to accelerate and reduce manual intervention during the data acquisition and image processing pipelines, with the goal of automating and streamlining the determination of protein structures of biomedical relevance.

    To improve the efficiency of data collection, we introduce cryo-ZSSR, a deep-internal learning-based method that enables the determination of 3D structures at resolutions surpassing the limits imposed by the imaging system. By combining low magnification imaging with in-silico image super-resolution (SR), cryo-ZSSR accelerates cryo-EM data collection by allowing to include more particles in each exposure without sacrificing resolution. To mitigate the need for manual intervention and further streamline sample screening and data collection, we develop the Smartscope framework which leverages deep learning-based navigation techniques to enable specimen screening in a fully automated manner, significantly increasing efficiency and reducing operational costs. For data processing downstream, we introduce deep-learning based detection algorithms to streamline and automate particle identification both in 2D - single particle analysis (SPA), and 3D - cryo-electron tomography (CET). Our approach enables precise detection of proteins-of-interest with minimal human intervention while reducing detection time from days to minutes, allowing the analysis of larger datasets than previously possible.

    Collectively, we show these methods substantially boost the efficiency of cryo-EM data acquisition and help streamline the SPA and CET image analysis pipelines, paving the way for the development of high-throughput strategies for high-resolution structure determination of biomolecules. We conclude this thesis by discussing the potential benefits and shortcomings of using deep learning-based algorithms in cryo-EM image analysis tasks.

  • ItemEmbargo
    Defining MAP4K3-mediated Signaling Pathways That Regulate mTORC1 Activation and Beyond
    (2023) Branch, Mary Rose

    Germinal center kinases (GCKs) belong to the mammalian Ste20-like family of serine/threonine kinases and participate in various signaling pathways needed to regulate a wide range of cellular activities. GCK-like kinase (GLK), also known as MAP4K3, belongs to the MAP kinase kinase kinase kinase (MAP4K) family of proteins and has recently been established as a key node in the amino acid response pathway and putative nutrient sensing regulator in cells, as it is required for the amino acid-dependent activation of the mechanistic target of rapamycin complex 1 (mTORC1)—a central regulator of cell growth and metabolism. The precise mechanism(s) by which MAP4K3 activates mTORC1 under conditions of amino acid satiety, however, are undefined. Recent studies in the La Spada lab suggest MAP4K3 activates mTORC1 by phosphorylating the NAD-dependent deacetylase sirtuin 1 (Sirt1) and subsequently, inhibiting the LKB1-AMPK pathway—a pathway that suppresses mTORC1 activation during starvation. MAP4K3 has additionally been linked to the regulation of cellular stress responses, autophagy, growth, survival, and organismal lifespan through largely unknown pathways. My working hypothesis is that MAP4K3 serves as an amino acid sensor and activates mTORC1 through phosphorylation of Sirt1 and subsequent inhibition of the mTORC1-suppressing Sirt1-LKB1-AMPK pathway under conditions of amino acid satiety and engages different biological pathways by virtue of its protein interacting partners to control critical cellular processes involved in cell growth, survival, and lifespan. In study 1, I used amino acid depletion/restimulation experiments and phospho mass spectrometry to establish a direct link between MAP4K3 and the Sirt1-LKB1-AMPK pathway and determines that Sirt1 is phosphorylated at Threonine 344 (T344) in a MAP4K3- and amino acid-dependent manner. Furthermore, I showed that phosphorylation of T344 inhibits Sirt1 and is sufficient to restore amino acid-dependent mTORC1 activation in cells lacking MAP4K3. To elucidate additional pathways regulated by MAP4K3, in study 2, I sought to discover novel MAP4K3 interacting partners by integrating proteomics interactome data and phosphoproteomics data followed by validation studies in cells. Experiments from these studies indicate a novel role for MAP4K3 in regulating DNA double-strand break (DSB) sensing and repair in the nucleus, mTOR localization to the lysosome through the GATOR2 complex, and endocytosis. Recent discoveries regarding the important role for MAP4K3 in nutrient sensing through mTORC1 activation and other cellular activities, including cell growth, autophagy, and survival are significant because deregulation of these cellular processes has been implicated in aging, as well as a wide array of human diseases including cancer, immunological disorders, and neurodegeneration. This dissertation, thus, sheds light on the molecular mechanisms by which MAP4K3 regulates these processes and provides significant insight into the modulation of these pathways in health and disease states.

  • ItemOpen Access
    JNK Signaling Mediates Glial Proliferation in the Regenerating Zebrafish Spinal Cord
    (2023) Becker, Clayton J

    Zebrafish possess the remarkable capacity to regenerate from spinal cord injuries that would leave mammals such as humans permanently paralyzed. Much research into zebrafish spinal cord regeneration has focused on identifying extracellular growth factors and matrix components which create a pro-regenerative environment; however, it is just as important to identify and understand the transcription factors which control pro-regenerative transcriptional responses within the resident stem cell population of the spinal cord, and the signaling cascades which translate the known extracellular ligands into cellular responses. Using CRISPR/Cas9, we generated two novel transcription factor knockout zebrafish lines which we tested for spinal cord regeneration defects and found no difference in regenerative capacity. Using a chemical screen, we identify JNK signaling as a necessary regulator of glial cell cycling and tissue bridging during spinal cord regeneration in larval zebrafish. With a kinase translocation reporter, we visualize and quantify JNK signaling dynamics at single-cell resolution in glial cell populations in developing larvae and during injury-induced regeneration. Glial JNK signaling is patterned in time and space during development and regeneration, decreasing globally as the tissue matures and increasing in the rostral cord stump upon transection injury. Thus, we present a tool to visualize signaling activity in the larval zebrafish spinal cord and demonstrate that dynamic JNK activity after spinal cord injury directs a proliferative response of glial cells during spinal cord regeneration.

  • ItemOpen Access
    The Omic Modifiers of Morbidity and Mortality in Sickle Cell Disease
    (2023) Lê, Brandon Minh

    Sickle cell disease (SCD) is a human genetic disorder caused by a mutation in the hemoglobin beta gene, causing sickling of red blood cells (RBCs) under hypoxic conditions, vaso-occlusion and adherence to other cells and endothelium, and downstream cellular and organ damage, ultimately resulting in higher morbidity and mortality relative to healthy people. While SCD is a Mendelian disorder defined by mutation in a single gene, the clinical presentation of people with SCD is highly heterogeneous. Typical SCD complications like acute chest syndrome (ACS), pain crises, and strokes are common but not universal, the range of severity of these outcomes is highly variable (higher morbidity, but not in all people with SCD), and life expectancy is lower on average (United States: 54 years). While the hemoglobin beta locus has been comprehensively studied as the origin of SCD, study on the other genetic and “-omic” factors that modify the disease presentation are less understood. Investigation into these omic modifiers of SCD may provide insight into many potential therapeutic targets that can greatly increase the quality of life and lifespan of people with SCD.

    To advance knowledge of omic modifiers of SCD, multiple approaches combining large-scale biological datasets with new methodologies and toolkits have been used to assess SCD progression across multiple facets. Whereas prior research on SCD modifiers has been performed on smaller datasets with limited genomic data, we have performed genome-wide analyses with whole-genome sequences across much larger cohorts of people with SCD. In addition, other omic datasets are addressed. Variability in methylation at CpG sites are utilized to provide measurements of biological aging in SCD that differs from normal, healthy biological aging.

    Across these analyses, a more comprehensive assessment of the omic modifiers of morbidity and mortality in SCD is achieved. Further work will serve to validate the results of these analyses and recommend omic variants for investigation in therapeutic interventions.

  • ItemEmbargo
    Toward Real-time, High-performance, and Generalizable Eating Episode Detection and Postprandial Carbohydrate Content Classification Using Non-invasive Wearables
    (2023) Chikwetu, Lucy

    As society grapples with the rising prevalence of diet-related diseases such as type 2 diabetes and coronary heart disease, the need for effective dietary monitoring becomes increasingly critical. Traditional self-reporting approaches such as 24-hour dietary recalls and food frequency questionnaires, while considered gold standard approaches, are plagued by high costs, significant memory demands on users, and inaccuracies, rendering them less than ideal for addressing the current health crisis. This reality has spurred the development of innovative dietary monitoring techniques, resulting in the advent of Automatic Dietary Monitoring (ADM) systems. These systems are designed to automatically track critical aspects of food intake, including the timing and duration of meals, the quantity of food consumed, and its nutritional content.

    This dissertation investigates the development of a real-time, high-performance, and generalizable eating detection platform using heart rate data from non-invasive wearables to detect eating episodes when an individual is sitting down. Additionally, it delves into the development of algorithms capable of classifying the carbohydrate content in foods using heart rate data from medical-grade, non-invasive wearables.

    We developed timeStampr—an iOS application for collecting timestamps essential for data labeling and ground truth establishment. We collected heart rate data from 23 participants in a controlled yet naturalistic laboratory setting using an Empatica E4 worn on the upper arm while individuals were eating. From the initial cohort, we excluded data from three participants due to sensor irregularities with dark skin tones and failure to meet the study’s health criteria.

    Our classifiers exhibited robust performance within a 90-second window, with the eating detection model achieving at least 87% in accuracy, precision, recall, and AUC-ROC, while the carbohydrate content model attained a minimum of 84% across these same metrics, all utilizing heart rate data from an Empatica E4. Additionally, this work demonstrates real-time testing of predictive models through RESTful APIs. Overall, the results of this dissertation demonstrate the potential of heart rate in eating detection and carbohydrate content classification.

  • ItemEmbargo
    Data-Driven Study of Polymer-Based Nanocomposites (PNC) – FAIR Online Data Resource Development and ML-Facilitated Material Design
    (2023) Lin, Anqi

    Polymer-based nanocomposites (PNCs) are materials consisting of nanoparticles and polymers. The enhancement of the mechanical, thermal, electrical, and other properties of the PNCs brought by the nanoparticles makes it a useful material in various applications. The huge amount of surface area brought by the nanoparticles interacts with polymer chains to form an interphase, which drives the property change. The presence of the interphase adds to the complexity of the processing-structure-property (p-s-p) relationship of PNCs that guide material design. As conventional trial-and-error approaches in the laboratory prove time-consuming and resource-intensive, an alternative approach is to utilize data-driven methods for PNC design. However, data-driven material design suffers from data scarcity issues.To tackle the data scarcity issue on a cross-community level, there has been a growing emphasis on the adoption of the Findable, Accessible, Interoperable, and Reusable (FAIR) data principles. In 2016, the NanoMine data resource, later evolved into MaterialsMine with the inclusion of metamaterials, and its accompanying schema were introduced to handle PNC data, offering a user-friendly and FAIR approach to manage these complex PNC data, facilitating data-driven material design. To make this schema more accessible to curators and material scientists, an Excel-based customizable master template was designed for experimental data. In parallel to the long-lasting cumulative effort of curating experimental PNC data from literature, simulation data can be generated and curated much faster due to its computational nature. Thus, the NanoMine schema and template for experimental data were expanded to support popular simulation methods like Finite Element Analysis (FEA) with high utilization of existing fields, demonstrating the flexibility of the schema/template approach. With the schema and template in place for NanoMine to host FEA data, an efficient and highly automated end-to-end pipeline was developed for FEA data generation. A data management system was implemented to capture the FEA data and the associated metadata, which are critical for the data to be FAIR. A resource management system was implemented to address the system restrictions. Starting from microstructure generation, all the way to packaging the data into a curation-ready format, the pipeline lives in a standardized Jupyter notebook for easier usage and better bookkeeping. FEA simulations, while faster than laboratory experiments, remain resource-intensive and are often constrained by commercial software licenses. Thus, the last part of this research aims to develop an efficient, reliable, and lightweight surrogate model for FEA simulation of the viscoelastic response of PNCs, named ViscoNet, with machine learning (ML). Drawing inspiration from NLP models like GPT, ViscoNet utilizes pre-training and fine-tuning techniques to reproduce FEA simulations, achieving a mean absolute percentage error (MAPE) of < 5% for rubbery modulus, < 1% for glassy modulus, and 1.22% for tan delta peak, with as few as 500 FEA simulation data for fine-tuning. ViscoNet demonstrates impressive generalization capabilities from thermoplastics to thermosets. ViscoNet enables the generation of over 20k VE responses in under 2 minutes, making it a versatile tool for high-throughput PNC design and optimization. Notably, ViscoNet does not require a GPU for training, allowing anyone with Internet access to download 500 FEA data from NanoMine and fine-tune ViscoNet on a personal laptop, thereby making data-driven materials design accessible to a broader scientific community.

  • ItemEmbargo
    Speaker Diarization with Deep Learning: Refinement, Online Extension and ASR Integration
    (2023) Wang, Weiqing

    As speech remains an essential mode of human communication, the necessity for advanced technologies in speaker diarization has risen significantly. Speaker diarization is the process of accurately annotating individual speakers within an audio segment, and this dissertation explores within this domain, systematically addressing three prevailing challenges through intertwined strands of investigation.

    Initially, we focus on the intricacies of overlapping speech and refine the conventional diarization systems with the sequential information integrated. Our approach not only recognizes these overlapping segments but also discerns the distinct speaker identities contained within, ensuring that each speaker is precisely categorized.

    Transitioning from the challenge of overlapping speech, we then address the pressing need for real-time speaker diarization. In response to the growing need for low-latency applications in various fields, such as smart agents and transcription services, our research adapts traditional systems, enhancing them to function seamlessly in real-time applications without sacrificing accuracy or efficiency.

    Lastly, we turn our attention to the vast reservoir of the potential that lies within contextual and textual data. Incorporating both audio and text data into speaker diarization not only augments the system's ability to distinguish speakers but also leverages the rich contextual cues often embedded in conversations, further improving the overall diarization performance.

    Through a coherent and systematic exploration of these three pivotal areas, the dissertation offers substantial contributions to the field of speaker diarization. The research navigates through the challenges of overlapping speech, real-time application demands, and the integration of contextual data, ultimately presenting a refined, reliable, and efficient speaker diarization system poised for application in diverse and dynamic communication environments.

  • ItemEmbargo
    Patterns of Autoantibody Expression in Multiple Sclerosis and Systemic Lupus Erythematosus Unveiled Through the Development of an Autoantigen Discovery Technology
    (2023) Doan, Europe Bailey

    The human body contains over 290 quintillion (1018) antibodies that circulate the blood and express extensive diversity in binding pathogens, allergens, or even self-proteins. Characterizing the antibody repertoire is useful for understanding pathogen response, cancer development, and autoimmune diseases. Current technologies have enabled the characterization of a subset of antibodies expressed in health and disease; however, continued technological advances may enable a more complete characterization of human antibody repertoires and an understanding of their contributions to pathogen protection, cancer surveillance, and autoimmune disease development.

    In this work, I describe the development of The Antigenome Platform for improved antibody discovery. This Platform is a high-throughput assay comprised of large-fragment cDNA libraries, a phage-display and serum antibody screening technology, and a robust bioinformatics analysis pipeline. The work described herein extends prior methodology through the development of a rigorous procedure for cDNA library preparation that allows the display of in-frame human cDNA fragments that are up to 250 amino acids. I applied this technology in the context of autoimmune diseases; therefore, human transcripts were used for cDNA library generation. Moreover, this platform assesses antibody binding to targets across approximately 90% of the human genome, allowing for an agnostic and robust evaluation of autoantibodies.The Antigenome Platform was applied to the study of two autoimmune diseases: Multiple sclerosis (MS) and Systemic Lupus Erythematosus (SLE). MS is a debilitating autoimmune disease of the central nervous system (CNS), which is characterized by demyelination and axonal injury and is often preceded by a demyelinating event called clinically isolated syndrome (CIS). Despite the importance of B cells and autoantibodies in MS pathology, their target specificities remain largely unknown. Therefore, I employed the Antigenome Platform for an agnostic and comprehensive evaluation of autoantibodies in MS. Toward this goal, I assayed serum samples from both placebo and treated MS patients enrolled in the REFLEX clinical trial, which assessed the effects of interferon beta-1a (Rebif®) on the conversion from CIS to MS. Serum autoantibodies from MS patients significantly and reproducibly enriched for known and novel protein targets; 166 targets were selected by >10% of patients’ sera. Further, 10 autoantibody biomarkers predicted conversion from CIS to MS, and 17 predicted patient responses to interferon beta-1a therapy. These findings indicate the existence of widespread autoantibody production in MS and provide novel biomarkers for continued study and prediction of disease progression. Systemic lupus erythematosus (SLE) is an autoimmune disease characterized by a wide array of clinical and immunologic features, including abundant production of autoantibodies, especially to components of the cell nucleus. The marked heterogeneity among SLE patients hinders research, our understanding of the disease, therapeutic development, and clinical trial success. As a result, there is great interest in biomarkers for distinguishing subtypes or clusters of SLE patients. Previous studies have identified four clusters of SLE patients based on patterns of autoantibody expression to 10-20 common target antigens. Since SLE patients are known to express autoantibodies to a much larger number of self-antigens (roughly 200 targets have been identified), I sought to further define SLE clusters with more comprehensive autoantibody profiles to improve and understand the molecular basis of clustering.

    The work described herein focused on serum samples from four SLE patient clusters defined by the expression of 13 autoantibodies. In a prior analysis, Cluster 1 patients exhibited positivity for antibodies to Ro60/Ro52/La; Cluster 2 patients exhibited positivity for antibodies to nucleosome/SmRNP/DNA/RNPA; Cluster 3 patients exhibited positivity for antibodies to beta2GP1/aCL-IgG/IgM; and Cluster 4 patients were negative for all 13 tested autoantibodies. Since it is a challenge to cluster some patients according to these criteria (i.e., some patients exhibit autoantibody profiles matching more than one cluster) and there are no known biomarkers for Cluster 4 patients, I employed the Antigenome Platform for the discovery of additional autoantibodies associated with each cluster.

    I identified patterns of 88, 49, 10, and 24 autoantibodies associated with four SLE clusters, which includes autoantibodies for the subgroup previously defined by a lack of common SLE autoantibodies. While clinicians and researchers generally focus on determining the specificities and function of anti-nuclear antibodies in SLE, my findings of anti-cytoplasmic antibodies call attention to the potential importance of cytoplasmic antigens in SLE disease. I further report autoantibodies targeting the kidney-enhanced cell surface protein, BCAM, in a cluster of patients associated with nephritis. By discovering new autoantigens, this study provides the basis for novel serological assays to improve understanding of autoantibody clustering and its impact on disease expression and outcome.

    The work described herein collectively provides a methodology for improved antigen target discovery that can be applied to the study of autoimmune disease, cancer, viral infection, and other applications. I have applied this technology to study the autoimmune diseases MS and SLE and have identified several autoantibody targets that shed light on each disease and provide potential biomarkers for precision-medicine approaches.

  • ItemEmbargo
    Resonant Infrared Matrix-Assisted Pulsed Laser Evaporation: Advancing Methodology and Elucidating Mechanisms for Precise Control of Film Morphology and Composition
    (2023) Zhang, Buang

    The aim of this dissertation is to advance the understanding and methodology of resonant infrared matrix-assisted pulsed laser evaporation (RIR-MAPLE), with a focus on the surfactant chemistry in organic emulsions, the fine modulation of film composition, and the development of novel deposition strategies combining both emulsion-based and solution-based target chemistries. RIR-MAPLE is a special thin-film deposition technique that extends the capabilities of the traditional pulsed laser deposition process to deposit a wide range of materials, including sensitive organic polymers (emulsion-based targets) and emerging hybrid perovskite materials (solution-based targets). It utilizes a pulsed laser, typically in the infrared range, which resonates with the vibrational modes of the matrix solvent molecules in a frozen target that comprises the material of interest dissolved in a primary solvent. The energy absorbed by the matrix solvent results in its sublimation, and kinetic energy is transferred to the solute (the material to be deposited), leading to solute ejection from the target without damaging sensitive organic materials. Although extensive research has previously been conducted to decipher the complexities of the RIR-MAPLE deposition process, several issues remain unresolved, and the full potential of this innovative technique has yet to be harnessed. This study addresses and investigates pivotal challenges inherent to RIR-MAPLE, paving the way for its application in a broader spectrum of future technological advancements.First, the deposition of organic material and its corresponding emulsion chemistry, specifically surfactant usage, was studied to determine predictive correlations with emulsified particle morphology and thin film properties. Consequently, the impact of surfactant choice on micelle density and the non-bonded interaction amongst disparate components was first documented and substantiated by the demonstration of blue organic light-emitting diodes (OLEDs). Second, an investigation was conducted to understand mechanisms for achieving fine control of composition when two organic materials are blended in a single film. An OLED exhibiting broadband emission due to two different polymer constituents was fabricated to demonstrate the precise control achievable over blended compositions. Consequently, it was discerned that the two organic materials remained contained within their respective emulsified particles and sequentially deposited atop one another without influencing the preceding deposited material. Third, the versatility of RIR-MAPLE was investigated by depositing films using solution-based and emulsion-based targets demonstrated by hybrid organic-inorganic perovskite (HOIP)-organic macromolecule nanocomposite films. A general approach for depositing such mixed nanocomposites was developed for RIR-MAPLE for the first time, subsequently unveiling the interactions between the organic macromolecule and the hybrid perovskite and achieving a fundamentally different compositional range and film morphology compared to other nanocomposite approaches.

  • ItemOpen Access
    Essays on the Direction of Technical Change
    (2023) Dionisi, Bernardo Alessandro

    The rate and direction of inventive activity are central to firms’ competitiveness as well as economic growth. A fundamental question arises from the fact that innovation increasingly relies on public scientific findings, many of which are freely published and accessible. How can firms capture private value from publicly available knowledge? Chapter 2 investigates the concept of first-mover advantage in utilizing public science. It finds that being first to apply cutting-edge public science enables broader patents, but requires active internal R&D to recognize opportunities early. Chapter 3 surveys the extensive literature employing quasi-experimental techniques to quantify forces shaping the direction of innovation. It contributes software tailored for innovation data to assist future research. Chapter 4 leverages FDA data to empirically analyze post-acquisition innovation trajectories in medical devices, revealing the slowed evolution of acquired technologies.Overall, this dissertation elucidates how firms derive competitive advantage from public science, surveys techniques for quantifying innovation’s direction, and empirically examines merger impacts on medical technology advancement. The studies contribute novel data, methods, and insights to support firms, policymakers, and other stakeholders with an interest in the evolution of technical change.

  • ItemEmbargo
    Leveraging Data Augmentation in Limited-Label Scenarios for Improved Generalization
    (2024) Ravindran, Swarna Kamlam

    The resurgence of Convolutional Neural Networks (CNNs) from the early foundational work is largely attributed to the advent of extensive manually labeled datasets, which has made it possible to train high-capacity models with strong generalization capabilities. However, the annotation cost for these datasets is often prohibitive, and so training CNNs on limited data in a fully-supervised setting remains a crucial problem. Data augmentation is a promising direction for improving generalization in scarce data settings.

    We study foundational augmentation techniques, including Mixed Sample Data Augmentations (MSDAs) and a no-parameter variant of RandAugment termed Preset-RandAugment, in the fully supervised scenario. We observe that Preset-RandAugment excels in limited-data contexts while MSDAs are moderately effective. In order to explain this behaviour, we refine ideas about diversity and realism from prior work and propose new ways to measure them. We postulate an additional property when data is limited: augmentations should encourage faster convergence by helping the model learn stable and invariant low-level features, focusing on less class-specific patterns. We explain the effectiveness of Preset-RandAugment in terms of these properties and identify low-level feature transforms as a key contributor to performance.

    Building on these insights, we introduce a novel augmentation technique called RandMSAugment that integrates complementary strengths of existing methods. It combines low-level feature transforms from Preset-RandAugment with interpolation and cut-and-paste from MSDA. We improve image diversity through added stochasticity in the mixing process. RandMSAugment significantly outperforms the competition on CIFAR-100, STL-10, and Tiny-Imagenet. With very small training sets (4, 25, 100 samples/class), RandMSAugment achieves compelling performance gains between 4.1\% and 6.75\%. Even with more training data (500 samples/class) we improve performance by 1.03\% to 2.47\%. We also incorporate RandMSAugment augmentations into a semi-supervised learning (SSL) framework and show promising improvements over the state-of-the-art SSL method, FlexMatch. The improvements are more significant when the number of labeled samples is smaller. RandMSAugment does not require hyperparameter tuning, extra validation data, or cumbersome optimizations.

    Finally, we combine RandMSAugment with another powerful generalization tool, ensembling, for fully-supervised training with limited samples. We show additonal improvements on the 3 classification benchmarks, which range between 2\% and 5\%. We empirically demonstrate that the gains due to ensembling are larger when the individual networks have moderate accuracies \ie outside of the low and high extremes.Furthermore, we introduce a simulation tool capable of providing insights about the maximum accuracy achievable through ensembling, under various conditions.

  • ItemEmbargo
    Speaker Representation Learning under Self-supervised and Knowledge Transfer Setting
    (2023) Cai, Danwei

    Speaker representation learning transforms speech signals into informative vectors, underpinning many audio applications. However, deep neural networks (DNNs), pivotal in this domain, falter with limited labeled data.

    To overcome this, the thesis presents two primary strategies: self-supervised learning and knowledge transfer from automatic speech recognition (ASR). We introduce a two-stage self-supervised framework utilizing unlabeled data. The first stage focuses on representation learning, while the second integrates clustering and discriminative training. This framework is further streamlined by introducing the self-supervised reflective learning approach, central to which is self-supervised knowledge distillation, optimized to mitigate label noise effects. This approach significantly improves self-supervised speaker representation quality.

    Leveraging the relationship between ASR and speaker verification, transfer learning methods are explored to use limited training data efficiently. Techniques include initializing with ASR-pretrained encoders, ASR-based knowledge distillation, and a speaker adaptor converting ASR features to speaker-specific ones.

    Additionally, the thesis investigates voice conversion spoofing countermeasures, aiming to detect attacker identities behind conversions.

    In essence, this research offers advancements in speaker representation learning, tackling data constraints, and enhancing security against voice spoofing, ultimately fortifying audio applications.

  • ItemEmbargo
    Mass Spectrometry Technologies for Spaceflight Applications
    (2023) Aloui, Tanouir

    The National Research Council’s Planetary Science 2013-2022 Decadal Survey underscores three interrelated themes pivotal to planetary science research: understanding solar system beginnings, searching for the requirements for life, and understanding the workings of solar systems. In situ mass spectrometry (MS) is the primary technique for the analysis of planetary substances, directly addressing the critical inquiries associated with these themes. The quintessential mass analyzer engineered for space exploration is envisioned to embody a suite of features: a mass range extending from 1 u to at least 500 u, capability for high-precision measurement of stable isotope ratios within a tolerance of ±1‰, and the ability to resolve distinct isobaric species at a low mass below 60 u, all with low power requirements. Incorporation of these capabilities within a single instrument is crucial for facilitating the exploration of the necessities of life and for advancing our understanding of solar system genesis and planetary development. Nevertheless, state-of-the-art existing spaceflight mass spectrometers do not fully integrate all these capabilities.In this research, three technologies are investigated to close this gap; spatial aperture coding, super-resolution, and field emission electron sources . The development of these three technologies as presented in this dissertation represent a significant step towards a mass spectrometer having all of the characteristics described above. First, Spatial aperture coding is a technique used to improve throughput without sacrificing resolution, historically in optical spectroscopy, and more recently as demonstrated by our laboratory at Duke University, in sector mass spectrometry (MS). Previously we demonstrated that aperture coding combined with a position-sensitive array detector in a miniature cycloidal mass spectrometer was successful in providing high-throughput, high-resolution measurements. However, due to poor alignment and field non-uniformities, reconstruction artifacts were present. In this dissertation, two methods were implemented to significantly reduce the presence of artifacts in reconstructed spectra. First, I employed a variable system response function across the mass range (10 – 110 u) instead of using a fixed function. Second, I modified the design by shifting the coded aperture slits relative to the center of the ionization volume to enable even illumination of the coded aperture slits. Both methods were successful in significantly reducing artifacts at low mass from above 35% of the peak height to less than 6% of the peak height. Second, higher resolution in fieldable mass spectrometers (MS) is desirable in space flight applications to enable resolving isobaric interferences at m/z < 60 u. Resolution in portable cycloidal MS coupled with array detectors could be improved by reducing the slit width and/or by reducing the width of the detector pixels. However, these solutions are expensive and can result in reduced sensitivity. In this dissertation, I demonstrate high-resolution spectral reconstruction in a cycloidal coded aperture miniature mass spectrometer (C-CAMMS) without changing the slit or detector pixel sizes using a class of signal processing techniques called super resolution (SR). I developed an SR reconstruction algorithm using a sampling SR approach whereby a set of spatially shifted low-resolution measurements are reconstructed into a higher-resolution spectrum. This algorithm was applied to experimental data collected using the C-CAMMS prototype. It was then applied to synthetic data with additive noise, system response variation, and spatial shift nonuniformity to investigate the source of reconstruction artifacts in the experimental data. Experimental results using two 1/2 pixel shifted spectra resulted in a resolution of 3/4 pixel full width at half maximum (FWHM) at m/z = 28 u. This resolution is equivalent to 0.013 u, six times better than the resolution previously published at m/z = 28 for N2+ using C-CAMMS. However, the reconstructed spectra exhibited some artifacts. The results of the synthetic data study indicate that the artifacts are most likely caused by the system response variation. Despite these artifacts, it was shown that the super-resolution algorithm is capable of resolving the isobaric interference between N2 and CO at m/z = 28. Third, Field emission electron sources for MS electron ionization have been of interest to spaceflight applications due to their low power compared to thermionic sources. However, state-of-the-art devices suffer from limitations such as high turn-on macroscopic field, low macroscopic current density, poor emission stability, and short lifetime. Field emitter arrays with a high spatial density of uniform emitters have the potential to address these problems. In this work, process development, fabrication, and testing of two novel field emission based devices are presented, including CNT array emitters and metallic nanowires. Instability in CNT emission was investigated using noise analysis and a polymer encapsulation process to reduce the effect of adsorbates on the tips of CNTs. This treatment was not successful in reducing emission noise in CNTs. Thus, electron beam lithography and templated electrodeposition were used to fabricate a high spatial density array of metallic nanowires, resulting in electron field emission with high macroscopic current density (2 A/cm2) and low turn-on macroscopic field (4.35 V/μm). Results indicate that templated electrodeposition of metallic nanowire arrays is a promising method for producing high-performance field emitters.

  • ItemEmbargo
    Refining Messaging Strategies to Increase Efficacy of Healthy Eating Interventions Among U.S. Black Christians
    (2023) Daly, Kaitlyn

    A disproportionate number of non-Hispanic Black men and women in the United States (U.S.) suffer from diet-related chronic diseases, including overweight and obesity, hypertension, and diabetes compared to other racial groups. Given that most Black adults in the U.S. identify as Christian and the church is a trusted and prominent institution in the community, health promotion interventions among Black Christians have been prioritized to address diet-related health disparities impacting this population. To persuade participants to eat healthier, existing interventions have targeted an important element of behavior change: healthy eating beliefs. In particular, the benefits that result from one’s healthy eating choices. However, current methods have constraints regarding which aspects of beliefs they address, specifically in terms of belief referent (i.e., individualistic versus prosocial) and belief number (i.e., fewer versus more), as well as the persuasion technique (i.e., direct persuasion versus self-persuasion) used to convey beliefs to participants. These limitations potentially restrict the full potential for beliefs to promote healthy eating. The purpose of this dissertation was to refine messaging strategies that target healthy eating beliefs to enhance the efficacy of existing healthy eating interventions among U.S. Black Christians aimed at reducing diet-related chronic health diseases. Chapter 1 of this dissertation introduces the health-related problem, gaps in existing healthy eating interventions, and the Theory of Planned Behavior underpinning this dissertation. Chapter 2 is a qualitative descriptive study exploring healthy eating beliefs, specifically perceptions of food, faith, and health, among a multiracial sample of U.S. Christians. Findings described four themes: (1) Healthy eating is a lifestyle; (2) Shifting from food as fuel to food for holistic health; (3) Prosocial flourishing: One’s food choices affect us all; and (4) Healthy eating is faithful eating. Theme 1 subthemes consisted of participant descriptions of (a) balanced food choices, (b) intentional eating behaviors, and (c) dominant cultures shaping universal definitions of healthy eating. Theme 2 subthemes demonstrated participant perceptions of holistic health including (a) physical health, (b) mental and emotional health, (c) social health, and (d) environmental health. In Theme 3, participants described how healthy eating benefits extended beyond personal gain to encompass the larger community. In Theme 4, participants aligned their eating habits with faith values of sanctity, stewardship, fellowship, justice, and forgiveness and compassion, illuminating the notion that healthy eating is faithful eating.

    Chapter 3 comprised of two feasibility studies to pilot test instructions of a web-based belief elicitation experiment manipulating two factors, belief referent (individualistic v. prosocial) and belief number (2 v. 6) among two separate web-based samples of U.S. Black Christians (Pilot 1: N = 100; Pilot 2: N = 60). The main finding from Pilot 1 suggested a need to strengthen the manipulation of belief referent, demonstrated by people in the prosocial conditions not providing the correct referent, self and others, in their written responses to the belief elicitation. Instructions were refined according to participant recommendations and tested in a second pilot study. Pilot 2 demonstrated improvements in instruction comprehension and manipulation check measures, particularly for the prosocial referent group, indicating it was appropriate to conduct the main trial with no further changes to study methods.

    Chapter 4 was a between-subjects randomized controlled trial to test the effect of belief referent (individualistic v. prosocial) and belief number (2 v. 6) on healthy eating intentions, attitudes, and behaviors among a web-based sample of U.S. Black Christian adults (N = 400). Findings revealed no condition group effect on post-test healthy eating attitudes and intentions (Aim 1). Additionally, there was no evidence of healthy eating attitudes mediating any group effect on post-test healthy eating intentions (Aim 2). However, the second aim did confirm healthy eating attitudes was a significant predictor of post-test healthy eating intentions. Additionally, results indicated that post-test healthy eating intentions correlated positively with self-reported healthy eating behavior at the one-week follow-up (Aim 3).

    Chapter 5 concludes the dissertation by synthesizing the entirety of findings across all chapters and discussing the implications and future research recommendations. This dissertation contributes to new insights for nursing, public health, and theology researchers and practitioners aiming to motivate people of faith to engage in healthy eating behaviors. The synthesis of findings suggest further work is necessary to confirm the relevancy and effects of holistic, prosocial, and faithful approaches to dietary health promotion for Christians. Beliefs remain an integral part of health promotion and behavior change to address the high rates of chronic disease in our nation. Future research efforts aimed at identifying effective healthy eating beliefs among diverse samples of Christians and testing their integration into comprehensive healthy eating interventions could strengthen the science of faith-based health promotion.

  • ItemEmbargo
    Investigating the Structure:Dynamics:Function Relationship of the MALAT1 Triple Helix
    (2023) Kassam, Kamillah Jena

    The “noncodingRNA (ncRNA) revolution” in the late 20th and early 21st century triggered the transition from scientists viewing ncRNA as cellular junk to realizing that ncRNA plays a variety of roles in biological functions in both healthy and disease related processes. With this discovery came the desire to drug the transcriptome and develop therapies to ameliorate diseases that had previously been thought of as undruggable. Among the potential RNA targets discovered was the ncRNA MALAT1 (Metastasis Associated Lung Adenocarcinoma Transcript 1) a long non-coding RNA that is expressed at relatively high levels in cells. MALAT1 was first identified as a marker for lung cancer, then as an oncogenic transcript shown to be over accumulated in several different cancer phenotypes. Previous work has shown the therapeutic potential of knocking down this transcript, making it an attractive potential target. In addition, the 3′-end of the mature MALAT1 transcript forms a U•A-U rich triple helix that evades normal cellular degradation pathways through the sequestration of an A-rich tail between two U-rich regions. Targeting this region with small molecules was shown to decrease metastasis in an organoid model, indicating the promise of the region as a drug target. However, the most effective targeting of an RNA must begin with understanding the underlying structure:dynamics:function relationship. Towards that end, this work aims to increase the understanding of the structure:dynamics:function relationship of the 3′-end of MALAT1 by: 1) probing the conformational landscape of the triple helix in different sequence contexts, 2) examining the protective function of the triple helix in different sequence contexts, and 3) investigating binding-competent structures found in the MALAT1 triple helix ensemble. In the first aim, we show that the 3′-end of MALAT1 is predicted to form modular, independently folding secondary structures. In addition, we report evidence of non-triplex contacts forming within the triple helix, supporting the presence of alternate, non-triple helix structures in the ensemble of the MALAT1 3′-end. In the second aim, we probe the change in protective function of the triple helix within different native sequence contexts and report the development of an enzymatic assay that we believe will be of use in probing the protective function of RNA triple helices and other RNA motifs in general. In the third aim, we investigate binding-competent structures within the triple helix ensemble through use of a mutation construct.

  • ItemEmbargo
    Advancing Deep-Generated Speech and Defending against Its Misuse
    (2023) Cai, Zexin

    Deep learning has revolutionized speech generation, spanning synthesis areas such as text-to-speech and voice conversion, leading to diverse advancements. On the one hand, when trained on high-quality datasets, artificial voices now exhibit a level of synthesized quality that rivals human speech in naturalness. On the other, cutting-edge deep synthesis research is making strides in producing controllable systems, allowing for generating audio signals in arbitrary voice and speaking style.

    Yet, despite their impressive synthesis capabilities, current speech generation systems still face challenges in controlling and manipulating speech attributes. Control over crucial attributes, such as speaker identity and language, essential for enhancing the functionality of a synthesis system, still needs to be improved. Specifically, systems capable of cloning a target speaker's voice in cross-lingual contexts or replicating unseen voices are still in their nascent stages. On the other hand, the heightened naturalness of synthesized speech has raised concerns, posing security threats to both humans and automated speech processing systems. The rise of accessible audio deepfakes, capable of spreading misinformation or bypassing biometric security, accentuates the complex interplay between advancing and defencing against deep-synthesized speech.

    Consequently, this dissertation delves into the dynamics of deep-generated speech, viewing it from two perspectives. Offensively, we aim to enhance synthesis systems to elevate their capabilities. On the defensive side, we introduce methodologies to counter emerging audio deepfake threats, offering solutions grounded in detection-based approaches and reliable synthesis system design.

    Our research yields several noteworthy findings and conclusions. First, we present an improved voice cloning method incorporated with our novel feedback speaker consistency mechanism. Second, we demonstrate the feasibility of achieving cross-lingual multi-speaker speech synthesis with a limited amount of bilingual data, offering a synthesis method capable of producing diverse audio across various speakers and languages. Third, our proposed frame-level detection model for partially fake audio attacks proves effective in detecting tampered utterances and locating the modified regions within. Lastly, by employing an invertible synthesis system, we can trace back to the original speaker of a converted utterance. Despite these strides, each domain of our study still confronts challenges, further fueling our motivation for persistent research and refinement of the associated performance.

  • ItemEmbargo
    Dynamics of Electrostatic Systems for Energy Conversion Applications
    (2023) Coonley, Kip D.

    The work presented here describes the electrostatic force, it’s nature, and it’s use in electromechanical systems. Energy transfer from both electrical-to-mechanical and mechanical-to-electrical are described. The electrostatic force is investigated in detail.Patterning of electrostatic rotary capacitive plates provides a novel strategy for up-converting low frequency mechanical excitation sources. The rotating plates allow for output waveform signal conditioning in both control of frequency and waveform shaping. An experimental set-up consisting of a 5.08 cm (2”) diameter rotary electrostatic capacitor harvester was designed and tested at mechanical rotation frequencies ranging from 1–35 Hz. Quarter plates were used to double the rate of change in area. Plates were spaced 3 mm apart with an applied voltage of 6.55 kV maintained by a 8.3 nF capacitor bank. Resistive loads between 10kΩ−10M Ω were used to verify current flow from the rotary capacitor. Simulation was carried out using a current source GTABLE model in PSPICE. Electrostatic theory demonstrates similar current magnitudes and the same upward trend with frequency. Further experimental analysis of a translating spring-mass system with a constant electrostatic force in the presence of viscous damping is presented and compared with simulation. A model for the linear translating electrostatic system not under the influence of viscous damping is first considered. An analytical equation is derived which provides a theoretical model for the behavior of the system and simulation in carried out and compared with the solution and theoretical model. Next, an approximate analytical solution to the electrostatic oscillator system in the presence of viscous damping is completed and a recursive relationship for the piecewise solution is presented. Conclusions and future work suggest several avenues for further investigation where the electrostatic force in electromechanical systems could be advanced including application areas.

  • ItemEmbargo
    Search for a Fermiophobic Beyond the Standard Model Charged Higgs boson through W γ resonances for masses below 200 GeV
    (2023) Patel, Utsav Mukesh

    A search for a beyond the Standard Model charged Higgs boson through a combined $W\gamma$ resonance is presented. The final state consists of an electron or a muon accompanied by at least one photon. The search is uses 140 fb$^{-1}$ of proton-proton collision data at a centre-of-mass energy of 13 TeV collected using the ATLAS detector within the Large Hadron Collider. A binned maximum likelihood fit is performed with signal samples in the mass range of 110 to 200 GeV. The invariant mass of the $W\gamma$ system is used as a discriminant in the statistical analysis methods. In this analysis, we place exclusion limits on producing a fermiophobic charged Higgs decaying to a $W\gamma$ pair through focuses on model-dependent and model-independent approaches. Exclusion limits are also presented for producing a fermiophobic charged Higgs through a di-Higgs Drell-Yan process with the requirement of at least one charged Higgs producing a $W\gamma$ pair of particles in the final state. As such, we extend the exclusion limits to cover the mass range of Beyond the Standard Model Higgs of 110-200 GeV.

  • ItemEmbargo
    Conserved atypical cadherin, Fat2, regulates axon terminal organization in the developing Drosophila olfactory sensory neurons
    (2023) Vien, Khanh My

    In both insects and mammals, odor detection depends heavily on diverse classes of olfactory neurons that organize their axons to converge in a class-specific manner within the central brain’s olfactory bulb, or antennal lobe in flies. The olfactory sensory circuit is characterized by its unique and essential feature—a functionally organized topographic map. This map relies on the convergence of axons from dispersed olfactory sensory neurons of the same type into specific regions known as class-specific glomeruli. Exploring how the identity of neurons shapes this circuit organization is a central pursuit in neurobiology, given its significant implications for neurodegenerative diseases and neuronal dysfunction.In the olfactory system, various cell surface proteins, such as Robo/Slit and Toll receptors, govern numerous aspects of circuit organization, including axon guidance and synaptic matching. In our study, we have identified an atypical cadherin protein called Fat2 (also known as Kugelei) as a regulator of axon organization specific to neuronal classes. Fat2 is expressed in olfactory receptor neurons (ORNs) and local interneurons (LNs) within olfactory circuits, with minimal expression in projection neurons (PNs). Notably, Fat2 expression levels vary depending on neuronal class and peak during pupal development. In cases of fat2 gene mutations, we observed varying degrees of phenotypic presentations in ORN axon terminals belonging to different classes, with a notable trend toward more severe effects in classes with higher Fat2 expression. In the most extreme cases, fat2 mutations resulted in ORN degeneration. Our findings suggest that the intracellular domain of Fat2 is crucial for its role in organizing ORN axons. Specifically, during early stages of olfactory circuit development, Fat2 plays a pivotal role in coordinating axons precisely, facilitating the formation of class-specific glomerular structures. Importantly, our research indicates that the expression of fat2 by PNs and LNs does not significantly contribute to ORN organization. Finally, we have identified potential interactors of the Fat2 intracellular domain, namely APC family proteins (Adenomatous polyposis coli) and dop (Drop out), which likely coordinate cytoskeletal remodeling essential for axon retraction during protoglomerular development. In summary, our study establishes a foundational understanding of Fat2's role in organizing the olfactory circuit and underscores the critical importance of axon behavior in the maturation of glomeruli.

  • ItemOpen Access
    From Spectral Theorem to Spectral Statistics of Large Random Matrices with Spatio-Temporal Dependencies
    (2023) Naeem, Muhammad Abdullah

    High dimensional random dynamical systems are ubiquitous, including-but not limited to- cyber-physical systems, daily return on different stocks of S\&P 1500 and velocity profile of interacting particle systems around McKeanVlasov limit. Mathematically speaking, observed time series data can be captured via a stable $n-$ dimensional linear transformation `$A$' and additive randomness. System identification aims at extracting useful information about underlying dynamical system, given a length $N$ trajectory from it (corresponds to an $n \times N$ dimensional data matrix). We use spectral theorem for non-Hermitian operators to show that spatio-temperal correlations are dictated by the \emph{discrepancy between algebraic andgeometric multiplicity of distinct eigenvalues} corresponding to state transition matrix. Small discrepancies imply that original trajectory essentially comprises of multiple \emph{lower dimensional random dynamical systems living on $A$ invariant subspaces and are statistically independent of each other}. In the process, we provide first quantitative handle on decay rate of finite powers of state transition matrix $\|A^{k}\|$ . It is shown that when a stable dynamical system has only one distinct eigenvalue and discrepancy of $n-1$: $\|A\|$ has a dependence on $n$, resulting dynamics are \emph{spatially inseparable} and consequently there exist at least one row with covariates of typical size $\Theta\big(\sqrt{N-n+1}$ $e^{n}\big)$ i.e., even under stability assumption, covariates can \emph{suffer from curse of dimensionality }.

    In the light of these findings we set the stage for non-asymptotic error analysis in estimation of state transition matrix $A$ via least squares regression on observed trajectory by showing that element-wise error is essentially a variant of well-know Littlewood-Offord problem and(can be extremely sensitive to dimension of the state space and number of iterations). We also show that largest singular value of the data matrix can be cursed by dimensionality even when state-transition matrix is stable. Overarching theme of this thesis is new theoretical results on spectral theorem for non-Hermitian operators, non-asymptotic behavior of high dimensional dynamical systems , which we incorporate with the work of Talagrand on concentration of measure phenomenon to better understand behavior of the structured random matrices(data matrix) and subsequently the performance of different learning algorithms with dependent data. Besides, we also show that there exists stable linear Gaussians with process level Talagrands' inequality linear in dimension of the state space(previously an open problem), along with deterioration of mixing times with increase in discrepancy between algebraic and geometric multiplicity of $A$.