Speaker Representation Learning under Self-supervised and Knowledge Transfer Setting

Loading...
Thumbnail Image

Date

2023

Journal Title

Journal ISSN

Volume Title

Repository Usage Stats

1
views
0
downloads

Abstract

Speaker representation learning transforms speech signals into informative vectors, underpinning many audio applications. However, deep neural networks (DNNs), pivotal in this domain, falter with limited labeled data.

To overcome this, the thesis presents two primary strategies: self-supervised learning and knowledge transfer from automatic speech recognition (ASR). We introduce a two-stage self-supervised framework utilizing unlabeled data. The first stage focuses on representation learning, while the second integrates clustering and discriminative training. This framework is further streamlined by introducing the self-supervised reflective learning approach, central to which is self-supervised knowledge distillation, optimized to mitigate label noise effects. This approach significantly improves self-supervised speaker representation quality.

Leveraging the relationship between ASR and speaker verification, transfer learning methods are explored to use limited training data efficiently. Techniques include initializing with ASR-pretrained encoders, ASR-based knowledge distillation, and a speaker adaptor converting ASR features to speaker-specific ones.

Additionally, the thesis investigates voice conversion spoofing countermeasures, aiming to detect attacker identities behind conversions.

In essence, this research offers advancements in speaker representation learning, tackling data constraints, and enhancing security against voice spoofing, ultimately fortifying audio applications.

Description

Provenance

Citation

Citation

Cai, Danwei (2023). Speaker Representation Learning under Self-supervised and Knowledge Transfer Setting. Dissertation, Duke University. Retrieved from https://hdl.handle.net/10161/30338.

Collections


Dukes student scholarship is made available to the public using a Creative Commons Attribution / Non-commercial / No derivative (CC-BY-NC-ND) license.