Speaker Representation Learning under Self-supervised and Knowledge Transfer Setting

Cai, Danwei

Speaker Representation Learning under Self-supervised and Knowledge Transfer Setting

View / Download2.98 MB

Date

2023

Authors

Cai, Danwei

Advisors

Li, Ming

Li, Xin

Repository Usage Stats

4
views

25
downloads

Abstract

Speaker representation learning transforms speech signals into informative vectors, underpinning many audio applications. However, deep neural networks (DNNs), pivotal in this domain, falter with limited labeled data.

To overcome this, the thesis presents two primary strategies: self-supervised learning and knowledge transfer from automatic speech recognition (ASR). We introduce a two-stage self-supervised framework utilizing unlabeled data. The first stage focuses on representation learning, while the second integrates clustering and discriminative training. This framework is further streamlined by introducing the self-supervised reflective learning approach, central to which is self-supervised knowledge distillation, optimized to mitigate label noise effects. This approach significantly improves self-supervised speaker representation quality.

Leveraging the relationship between ASR and speaker verification, transfer learning methods are explored to use limited training data efficiently. Techniques include initializing with ASR-pretrained encoders, ASR-based knowledge distillation, and a speaker adaptor converting ASR features to speaker-specific ones.

Additionally, the thesis investigates voice conversion spoofing countermeasures, aiming to detect attacker identities behind conversions.

In essence, this research offers advancements in speaker representation learning, tackling data constraints, and enhancing security against voice spoofing, ultimately fortifying audio applications.

Type

Dissertation

Department

Electrical and Computer Engineering

Subjects

Artificial intelligence, Computer engineering, Electrical engineering, Deep neural network, Self-supervised learning, Speaker recognition, Transfer learning, Voice conversion

Permalink

https://hdl.handle.net/10161/30338

Rights

https://creativecommons.org/licenses/by-nc-nd/4.0/

Citation

Cai, Danwei (2023). Speaker Representation Learning under Self-supervised and Knowledge Transfer Setting. Dissertation, Duke University. Retrieved from https://hdl.handle.net/10161/30338.

Collections

Dissertations

Full item page

Except where otherwise noted, student scholarship that was shared on DukeSpace after 2009 is made available to the public under a Creative Commons Attribution / Non-commercial / No derivatives (CC-BY-NC-ND) license. All rights in student work shared on DukeSpace before 2009 remain with the author and/or their designee, whose permission may be required for reuse.

Speaker Representation Learning under Self-supervised and Knowledge Transfer Setting

Date

Authors

Advisors

Journal Title

Journal ISSN

Volume Title

Repository Usage Stats

Abstract

Type

Department

Description

Provenance

Subjects

Citation

Permalink

Rights

Citation

Collections