Browsing by Author "Li, Ming"
Results Per Page
Sort Options
Item Open Access Advancing Deep-Generated Speech and Defending against Its Misuse(2023) Cai, ZexinDeep learning has revolutionized speech generation, spanning synthesis areas such as text-to-speech and voice conversion, leading to diverse advancements. On the one hand, when trained on high-quality datasets, artificial voices now exhibit a level of synthesized quality that rivals human speech in naturalness. On the other, cutting-edge deep synthesis research is making strides in producing controllable systems, allowing for generating audio signals in arbitrary voice and speaking style.
Yet, despite their impressive synthesis capabilities, current speech generation systems still face challenges in controlling and manipulating speech attributes. Control over crucial attributes, such as speaker identity and language, essential for enhancing the functionality of a synthesis system, still needs to be improved. Specifically, systems capable of cloning a target speaker's voice in cross-lingual contexts or replicating unseen voices are still in their nascent stages. On the other hand, the heightened naturalness of synthesized speech has raised concerns, posing security threats to both humans and automated speech processing systems. The rise of accessible audio deepfakes, capable of spreading misinformation or bypassing biometric security, accentuates the complex interplay between advancing and defencing against deep-synthesized speech.
Consequently, this dissertation delves into the dynamics of deep-generated speech, viewing it from two perspectives. Offensively, we aim to enhance synthesis systems to elevate their capabilities. On the defensive side, we introduce methodologies to counter emerging audio deepfake threats, offering solutions grounded in detection-based approaches and reliable synthesis system design.
Our research yields several noteworthy findings and conclusions. First, we present an improved voice cloning method incorporated with our novel feedback speaker consistency mechanism. Second, we demonstrate the feasibility of achieving cross-lingual multi-speaker speech synthesis with a limited amount of bilingual data, offering a synthesis method capable of producing diverse audio across various speakers and languages. Third, our proposed frame-level detection model for partially fake audio attacks proves effective in detecting tampered utterances and locating the modified regions within. Lastly, by employing an invertible synthesis system, we can trace back to the original speaker of a converted utterance. Despite these strides, each domain of our study still confronts challenges, further fueling our motivation for persistent research and refinement of the associated performance.
Item Open Access Inhibition of adaptive immune responses leads to a fatal clinical outcome in SIV-infected pigtailed macaques but not vervet African green monkeys.(PLoS Pathog, 2009-12) Schmitz, Jörn E; Zahn, Roland C; Brown, Charles R; Rett, Melisa D; Li, Ming; Tang, Haili; Pryputniewicz, Sarah; Byrum, Russell A; Kaur, Amitinder; Montefiori, David C; Allan, Jonathan S; Goldstein, Simoy; Hirsch, Vanessa MAfrican green monkeys (AGM) and other natural hosts for simian immunodeficiency virus (SIV) do not develop an AIDS-like disease following SIV infection. To evaluate differences in the role of SIV-specific adaptive immune responses between natural and nonnatural hosts, we used SIV(agmVer90) to infect vervet AGM and pigtailed macaques (PTM). This infection results in robust viral replication in both vervet AGM and pigtailed macaques (PTM) but only induces AIDS in the latter species. We delayed the development of adaptive immune responses through combined administration of anti-CD8 and anti-CD20 lymphocyte-depleting antibodies during primary infection of PTM (n = 4) and AGM (n = 4), and compared these animals to historical controls infected with the same virus. Lymphocyte depletion resulted in a 1-log increase in primary viremia and a 4-log increase in post-acute viremia in PTM. Three of the four PTM had to be euthanized within 6 weeks of inoculation due to massive CMV reactivation and disease. In contrast, all four lymphocyte-depleted AGM remained healthy. The lymphocyte-depleted AGM showed only a trend toward a prolongation in peak viremia but the groups were indistinguishable during chronic infection. These data show that adaptive immune responses are critical for controlling disease progression in pathogenic SIV infection in PTM. However, the maintenance of a disease-free course of SIV infection in AGM likely depends on a number of mechanisms including non-adaptive immune mechanisms.Item Open Access Speaker Diarization with Deep Learning: Refinement, Online Extension and ASR Integration(2023) Wang, WeiqingAs speech remains an essential mode of human communication, the necessity for advanced technologies in speaker diarization has risen significantly. Speaker diarization is the process of accurately annotating individual speakers within an audio segment, and this dissertation explores within this domain, systematically addressing three prevailing challenges through intertwined strands of investigation.
Initially, we focus on the intricacies of overlapping speech and refine the conventional diarization systems with the sequential information integrated. Our approach not only recognizes these overlapping segments but also discerns the distinct speaker identities contained within, ensuring that each speaker is precisely categorized.
Transitioning from the challenge of overlapping speech, we then address the pressing need for real-time speaker diarization. In response to the growing need for low-latency applications in various fields, such as smart agents and transcription services, our research adapts traditional systems, enhancing them to function seamlessly in real-time applications without sacrificing accuracy or efficiency.
Lastly, we turn our attention to the vast reservoir of the potential that lies within contextual and textual data. Incorporating both audio and text data into speaker diarization not only augments the system's ability to distinguish speakers but also leverages the rich contextual cues often embedded in conversations, further improving the overall diarization performance.
Through a coherent and systematic exploration of these three pivotal areas, the dissertation offers substantial contributions to the field of speaker diarization. The research navigates through the challenges of overlapping speech, real-time application demands, and the integration of contextual data, ultimately presenting a refined, reliable, and efficient speaker diarization system poised for application in diverse and dynamic communication environments.
Item Open Access Speaker Representation Learning under Self-supervised and Knowledge Transfer Setting(2023) Cai, DanweiSpeaker representation learning transforms speech signals into informative vectors, underpinning many audio applications. However, deep neural networks (DNNs), pivotal in this domain, falter with limited labeled data.
To overcome this, the thesis presents two primary strategies: self-supervised learning and knowledge transfer from automatic speech recognition (ASR). We introduce a two-stage self-supervised framework utilizing unlabeled data. The first stage focuses on representation learning, while the second integrates clustering and discriminative training. This framework is further streamlined by introducing the self-supervised reflective learning approach, central to which is self-supervised knowledge distillation, optimized to mitigate label noise effects. This approach significantly improves self-supervised speaker representation quality.
Leveraging the relationship between ASR and speaker verification, transfer learning methods are explored to use limited training data efficiently. Techniques include initializing with ASR-pretrained encoders, ASR-based knowledge distillation, and a speaker adaptor converting ASR features to speaker-specific ones.
Additionally, the thesis investigates voice conversion spoofing countermeasures, aiming to detect attacker identities behind conversions.
In essence, this research offers advancements in speaker representation learning, tackling data constraints, and enhancing security against voice spoofing, ultimately fortifying audio applications.
Item Open Access Use of biological detection methods to assess dioxin-like compounds in sediments of Bohai Bay, China.(Ecotoxicology and environmental safety, 2019-05) Dong, Wenjing; Wang, Feng; Fang, Mingliang; Wu, Jie; Wang, Shuaiyu; Li, Ming; Yang, Jingfeng; Chernick, Melissa; Hinton, David E; Pei, De-Sheng; Chen, Hongxing; Zheng, Na; Mu, Jingli; Xie, Lingtian; Dong, WuBohai Bay, in the western region of northeastern China's Bohai Sea, receives water from large rivers containing various pollutants including dioxin-like compounds (DLCs). This study used the established zebrafish (Danio rerio) model, its known developmental toxicity endpoints and sensitive molecular analyses to evaluate sediments near and around an industrial effluent site in Bohai Bay. The primary objective was to assess the efficacy of rapid biological detection methods as an addition to chemical analyses. Embryos were exposed to various concentrations of sediment extracts as well as a 2, 3, 7, 8-tetrachlorodibenzo-p-dioxin (TCDD) positive control. Exposure to sediment extract nearest the discharge site (P1) resulted in the most severe- and highest rates of change in embryos and larvae, suggesting that DLC contaminated sediment probably did not occur much beyond it. P1 extract resulted in concentration dependent increases in mortality and pericardial edema. Its highest concentration caused up-regulation of P-450 (CYP)-1A1(CYP1A) mRNA expression at 72 h post fertilization (hpf), an increase in its expression in gill arches as observed by whole mount in situ hybridization, and an increased signal in the Tg(cyp1a: mCherry) transgenic line. The pattern and magnitude of response was very similar to that of TCDD and supported the presence of DLCs in these sediment samples. Follow-up chemical analysis confirmed this presence and identified H7CDF, O8CDF and O8CDD as the main components in P1 extract. This study validates the use of biological assays as a rapid, sensitive, and cost-effective method to evaluate DLCs and their effects in sediment samples. Additionally, it provides support for the conclusion that DLCs have limited remobilization capacity in marine sediments.