Using Phoneme Groups to Train Deep Learning Models to Remove Reverberation in Cochlear Implant Stimulus Patterns
Access is limited until:
A cochlear implant (CI) is a medical device that aims to restore sound perception and speech intelligibility to individuals with profound sensorineural hearing loss. CIs mimic the function of the auditory system by transforming an acoustic signal into a sequence of electrical pulses that stimulate the auditory nerve, resulting in the perception of sound. However, compared to normal-hearing individuals, CI users experience considerable amounts of difficulty in understanding speech in listening environments that contain reverberation and noise.
One way to address this problem is through time-frequency masking, where the time-frequency representation of reverberant and/or noisy speech is elementwise multiplied by a matrix of gain values, known as a mask, in an attempt to suppress reverberation and noise. Because the mask requires knowledge of the anechoic signal, an algorithm must be developed to estimate the mask based on information available in the reverberant signal. Modern algorithms typically use traditional machine learning algorithms or deep learning algorithms to estimate the mask. However, existing mask estimation algorithms have not yet provided consistent benefits to CI users in a variety of reverberant environments.
This dissertation proposes a mask estimation algorithm that explicitly leverages knowledge of individual units of speech known as phonemes. Because the acoustic energy of phonemes is typically concentrated in different frequency ranges, it is hypothesized that a phoneme-based approach to mask estimation can provide larger benefits than a phoneme-independent approach. This phoneme-specific algorithm operates by first predicting the phoneme and then activating a mask estimation model specific to the detected phoneme. The first step in this work was to determine the upper bound of the phoneme-specific approach by testing normal-hearing listeners in the ideal case where the phoneme is assumed to be known perfectly. Because the phoneme is unknown in real-time, the second step was to develop a phoneme classification algorithm to categorize the phoneme within the real-time constraints of a CI. The third step was to assess the phoneme-specific algorithm in the non-ideal case where the phoneme classification algorithm is used to select the appropriate phoneme-specific mask estimation model. To evaluate the impact of the phoneme-specific mask estimation algorithm on speech intelligibility, another study was conducted in normal-hearing listeners and in a CI user. Overall, the findings of this dissertation suggest the benefit of incorporating the phonemic structure of speech into speech enhancement algorithms.
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Rights for Collection: Duke Dissertations
Works are deposited here by their authors, and represent their research and opinions, not that of Duke University. Some materials and descriptions may include offensive content. More info