Using Phoneme Groups to Train Deep Learning Models to Remove Reverberation in Cochlear Implant Stimulus Patterns
Date
2022
Authors
Advisors
Journal Title
Journal ISSN
Volume Title
Repository Usage Stats
views
downloads
Abstract
A cochlear implant (CI) is a medical device that aims to restore sound perception and speech intelligibility to individuals with profound sensorineural hearing loss. CIs mimic the function of the auditory system by transforming an acoustic signal into a sequence of electrical pulses that stimulate the auditory nerve, resulting in the perception of sound. However, compared to normal-hearing individuals, CI users experience considerable amounts of difficulty in understanding speech in listening environments that contain reverberation and noise.
One way to address this problem is through time-frequency masking, where the time-frequency representation of reverberant and/or noisy speech is elementwise multiplied by a matrix of gain values, known as a mask, in an attempt to suppress reverberation and noise. Because the mask requires knowledge of the anechoic signal, an algorithm must be developed to estimate the mask based on information available in the reverberant signal. Modern algorithms typically use traditional machine learning algorithms or deep learning algorithms to estimate the mask. However, existing mask estimation algorithms have not yet provided consistent benefits to CI users in a variety of reverberant environments.
This dissertation proposes a mask estimation algorithm that explicitly leverages knowledge of individual units of speech known as phonemes. Because the acoustic energy of phonemes is typically concentrated in different frequency ranges, it is hypothesized that a phoneme-based approach to mask estimation can provide larger benefits than a phoneme-independent approach. This phoneme-specific algorithm operates by first predicting the phoneme and then activating a mask estimation model specific to the detected phoneme. The first step in this work was to determine the upper bound of the phoneme-specific approach by testing normal-hearing listeners in the ideal case where the phoneme is assumed to be known perfectly. Because the phoneme is unknown in real-time, the second step was to develop a phoneme classification algorithm to categorize the phoneme within the real-time constraints of a CI. The third step was to assess the phoneme-specific algorithm in the non-ideal case where the phoneme classification algorithm is used to select the appropriate phoneme-specific mask estimation model. To evaluate the impact of the phoneme-specific mask estimation algorithm on speech intelligibility, another study was conducted in normal-hearing listeners and in a CI user. Overall, the findings of this dissertation suggest the benefit of incorporating the phonemic structure of speech into speech enhancement algorithms.
Type
Department
Description
Provenance
Subjects
Citation
Permalink
Citation
Chu, Kevin (2022). Using Phoneme Groups to Train Deep Learning Models to Remove Reverberation in Cochlear Implant Stimulus Patterns. Dissertation, Duke University. Retrieved from https://hdl.handle.net/10161/25141.
Collections
Except where otherwise noted, student scholarship that was shared on DukeSpace after 2009 is made available to the public under a Creative Commons Attribution / Non-commercial / No derivatives (CC-BY-NC-ND) license. All rights in student work shared on DukeSpace before 2009 remain with the author and/or their designee, whose permission may be required for reuse.