Speaker Diarization with Deep Learning: Refinement, Online Extension and ASR Integration

Wang, Weiqing

Speaker Diarization with Deep Learning: Refinement, Online Extension and ASR Integration

View / Download2.74 MB

Date

2023

Authors

Wang, Weiqing

Advisors

Li, Ming

Li, Xin

Repository Usage Stats

4
views

9
downloads

Abstract

As speech remains an essential mode of human communication, the necessity for advanced technologies in speaker diarization has risen significantly. Speaker diarization is the process of accurately annotating individual speakers within an audio segment, and this dissertation explores within this domain, systematically addressing three prevailing challenges through intertwined strands of investigation.

Initially, we focus on the intricacies of overlapping speech and refine the conventional diarization systems with the sequential information integrated. Our approach not only recognizes these overlapping segments but also discerns the distinct speaker identities contained within, ensuring that each speaker is precisely categorized.

Transitioning from the challenge of overlapping speech, we then address the pressing need for real-time speaker diarization. In response to the growing need for low-latency applications in various fields, such as smart agents and transcription services, our research adapts traditional systems, enhancing them to function seamlessly in real-time applications without sacrificing accuracy or efficiency.

Lastly, we turn our attention to the vast reservoir of the potential that lies within contextual and textual data. Incorporating both audio and text data into speaker diarization not only augments the system's ability to distinguish speakers but also leverages the rich contextual cues often embedded in conversations, further improving the overall diarization performance.

Through a coherent and systematic exploration of these three pivotal areas, the dissertation offers substantial contributions to the field of speaker diarization. The research navigates through the challenges of overlapping speech, real-time application demands, and the integration of contextual data, ultimately presenting a refined, reliable, and efficient speaker diarization system poised for application in diverse and dynamic communication environments.

Type

Dissertation

Department

Electrical and Computer Engineering

Subjects

Electrical engineering, Automatic speech recognition, Signal processing, Speaker diarization, Speech signal processing

Permalink

https://hdl.handle.net/10161/30344

Rights

https://creativecommons.org/licenses/by-nc-nd/4.0/

Citation

Wang, Weiqing (2023). Speaker Diarization with Deep Learning: Refinement, Online Extension and ASR Integration. Dissertation, Duke University. Retrieved from https://hdl.handle.net/10161/30344.

Collections

Dissertations

Full item page

Dukes student scholarship is made available to the public using a Creative Commons Attribution / Non-commercial / No derivative (CC-BY-NC-ND) license.

Speaker Diarization with Deep Learning: Refinement, Online Extension and ASR Integration

Date

Authors

Advisors

Journal Title

Journal ISSN

Volume Title

Repository Usage Stats

Abstract

Type

Department

Description

Provenance

Subjects

Citation

Permalink

Rights

Citation

Collections