Speaker Diarization with Deep Learning: Refinement, Online Extension and ASR Integration

Wang, Weiqing

Speaker Diarization with Deep Learning: Refinement, Online Extension and ASR Integration

dc.contributor.advisor	Li, Ming
dc.contributor.advisor	Li, Xin
dc.contributor.author	Wang, Weiqing
dc.date.accessioned	2024-03-07T18:39:44Z
dc.date.issued	2023
dc.department	Electrical and Computer Engineering
dc.description.abstract	As speech remains an essential mode of human communication, the necessity for advanced technologies in speaker diarization has risen significantly. Speaker diarization is the process of accurately annotating individual speakers within an audio segment, and this dissertation explores within this domain, systematically addressing three prevailing challenges through intertwined strands of investigation. Initially, we focus on the intricacies of overlapping speech and refine the conventional diarization systems with the sequential information integrated. Our approach not only recognizes these overlapping segments but also discerns the distinct speaker identities contained within, ensuring that each speaker is precisely categorized. Transitioning from the challenge of overlapping speech, we then address the pressing need for real-time speaker diarization. In response to the growing need for low-latency applications in various fields, such as smart agents and transcription services, our research adapts traditional systems, enhancing them to function seamlessly in real-time applications without sacrificing accuracy or efficiency. Lastly, we turn our attention to the vast reservoir of the potential that lies within contextual and textual data. Incorporating both audio and text data into speaker diarization not only augments the system's ability to distinguish speakers but also leverages the rich contextual cues often embedded in conversations, further improving the overall diarization performance. Through a coherent and systematic exploration of these three pivotal areas, the dissertation offers substantial contributions to the field of speaker diarization. The research navigates through the challenges of overlapping speech, real-time application demands, and the integration of contextual data, ultimately presenting a refined, reliable, and efficient speaker diarization system poised for application in diverse and dynamic communication environments.
dc.identifier.uri	https://hdl.handle.net/10161/30344
dc.rights.uri	https://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subject	Electrical engineering
dc.subject	Automatic speech recognition
dc.subject	Signal processing
dc.subject	Speaker diarization
dc.subject	Speech signal processing
dc.title	Speaker Diarization with Deep Learning: Refinement, Online Extension and ASR Integration
dc.type	Dissertation
duke.embargo.months	4
duke.embargo.release	2024-07-07T18:39:44Z

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Wang_duke_0066D_17714.pdf
Size:: 2.74 MB
Format:: Adobe Portable Document Format

Download

Collections

Dissertations