Speaker Diarization with Deep Learning: Refinement, Online Extension and ASR Integration

dc.contributor.advisor

Li, Ming

dc.contributor.advisor

Li, Xin

dc.contributor.author

Wang, Weiqing

dc.date.accessioned

2024-03-07T18:39:44Z

dc.date.issued

2023

dc.department

Electrical and Computer Engineering

dc.description.abstract

As speech remains an essential mode of human communication, the necessity for advanced technologies in speaker diarization has risen significantly. Speaker diarization is the process of accurately annotating individual speakers within an audio segment, and this dissertation explores within this domain, systematically addressing three prevailing challenges through intertwined strands of investigation.

Initially, we focus on the intricacies of overlapping speech and refine the conventional diarization systems with the sequential information integrated. Our approach not only recognizes these overlapping segments but also discerns the distinct speaker identities contained within, ensuring that each speaker is precisely categorized.

Transitioning from the challenge of overlapping speech, we then address the pressing need for real-time speaker diarization. In response to the growing need for low-latency applications in various fields, such as smart agents and transcription services, our research adapts traditional systems, enhancing them to function seamlessly in real-time applications without sacrificing accuracy or efficiency.

Lastly, we turn our attention to the vast reservoir of the potential that lies within contextual and textual data. Incorporating both audio and text data into speaker diarization not only augments the system's ability to distinguish speakers but also leverages the rich contextual cues often embedded in conversations, further improving the overall diarization performance.

Through a coherent and systematic exploration of these three pivotal areas, the dissertation offers substantial contributions to the field of speaker diarization. The research navigates through the challenges of overlapping speech, real-time application demands, and the integration of contextual data, ultimately presenting a refined, reliable, and efficient speaker diarization system poised for application in diverse and dynamic communication environments.

dc.identifier.uri

https://hdl.handle.net/10161/30344

dc.rights.uri

https://creativecommons.org/licenses/by-nc-nd/4.0/

dc.subject

Electrical engineering

dc.subject

Automatic speech recognition

dc.subject

Signal processing

dc.subject

Speaker diarization

dc.subject

Speech signal processing

dc.title

Speaker Diarization with Deep Learning: Refinement, Online Extension and ASR Integration

dc.type

Dissertation

duke.embargo.months

4

duke.embargo.release

2024-07-07T18:39:44Z

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Wang_duke_0066D_17714.pdf
Size:
2.74 MB
Format:
Adobe Portable Document Format

Collections