Optimizing Point of Care Interaction with Speech Recognition to Preserve Learnability and Intelligibility
Keywords:
Environment-optimized algorithms, speech enhancement, speech intelligibility.Abstract
Although the majority of speech augmentation algorithms increase voice quality, they may not augment speech intelligibility in noisy environments.
This work examines the creation of an algorithm that may be tailored to a particular acoustic environment to enhance speech intelligibility. The suggested technique disaggregates the input signal into time-frequency (T-F) units and employs a Bayesian classifier to make binary determinations on whether each T-F unit is predominated by the target signal or the noise masker. Target-dominated time-frequency units are preserved, but masker-dominated time-frequency units are eliminated. The Bayesian classifier is trained for each acoustic environment using an incremental method that perpetually adjusts the model parameters as further data is acquired.
Listening tests were performed to evaluate the intelligibility of speech synthesized using incrementally modified models based on the quantity of training sentences. The results demonstrated significant improvements in intelligibility, exceeding 60% in babbling at a 5 dB signal-to-noise ratio, with a minimum of 10 training phrases in babble and at least 80 words in other loud environments.
Downloads
References
P. C. Loizou, Speech Enhancement: Theory and Practice. Boca Raton: CRC, 2007.
Y. Hu and P. C. Loizou, “A comparative intelligibility study of singlemicrophone noise reduction algorithms,” J. Acoust. Soc. Amer., vol. 122, pp. 1777–1786, 2007.
J. S. Lim, “Evaluation of a correlation subtraction method for enhancing speech degraded by additive white noise,” IEEE Trans. Acoust. Speech Signal Process., vol. ASSP-26, no. 5, pp. 471–472, Oct. 1978.
J. A. Zakis, H. Dillon, and H. J. McDermott, “The design and evaluation of a hearing aid with trainable amplification parameters,” EarHear., vol. 28, no. 6, pp. 812–830, 2007.
J. E. Porter and S. F. Boll, “Optimal estimators for spectral restoration of noisy speech,” in Proc. Int. Conf. Acoust. Speech Signal Process., 1984, pp. 18A.2.1–18A.2.4.
S. Srinivasan, J. Samuelsson, and W. B. Kleijn, “Codebook driven short-term predictor parameter estimation for speech enhancement,” IEEE Trans. Audio, Speech, Lang. Process., vol. 14, no. 1, pp. 163–176, Jan. 2006.
J. Erkelens, J. Jensen, and R. Heusdens, “A general optimization procedure for spectral speech enhancement methods,” in Proc. Eur. Signal Proc. Conf., Florence, Italy, Sep. 2006.
J. Erkelens, J. Jensen, and R. Heusdens, “A data-driven approach to optimizing spectral speech enhancement methods for various error criteria,” Speech Commun., vol. 49, pp. 530–541, 2007.
T. Fingscheidt and S. Suhadi, “Data-driven speech enhancement,” in Proc. ITG-Fachtagung Sprachkommunikation, Kiel, Germany, 2006.
T. Fingscheidt, S. Suhadi, and S. Stan, “Environment-optimized speech enhancement,” IEEE Trans. Audio, Speech, Lang. Process., vol. 16, no. 4, pp. 825–834, May 2008.
Y. Ephraim and D. Malah, “Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator,” IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-32, no. 6, pp. 1109–1121, Dec. 1984.
T. Lotter and P. Vary, “Speech enhancement by MAP spectral amplitude estimation using a super-Gaussian speech model,” EURASIP J. Appl. Signal Process., vol. 7, pp. 1110–1126, 2005.
C. Bin and P. C. Loizou, “A Laplacian-based MMSE estimator for speech enhancement,” Speech Commun., pp. 134–143, 2007.
P. C. Loizou, “Speech enhancement based on perceptually motivated Bayesian estimators of the magnitude spectrum,” IEEE Trans. Speech Audio Process., vol. 13, no. 5, pp. 857–869, Sep. 2005.
D. Brungart, P. Chang, B. Simpson, and D. Wang, “Isolating the energetic component of speech-on-speech masking with ideal time-frequency segregation,” J. Acoust. Soc. Amer., vol. 120, pp. 4007–4018, 2006.
N. Li and P. C. Loizou, “Factors influencing intelligibility of ideal binary- masked speech: Implications for noise reduction,” J. Acoust. Soc. Amer., vol. 123, no. 3, pp. 1673–1682, 2008.
N. Li and P. C. Loizou, “Effect of spectral resolution on the intelligibility of ideal binary masked speech,” J. Acoust. Soc. Amer., vol. 123, no. 4, pp. 59–64, 2008.
M. Cooke, P. Green, L. Josifovski, and A. Vizinho, “Robust automatic speech recognition with missing and unreliable acoustic data,” Speech Commun., vol. 34, pp. 267–285, 2001.
G. Kim, Y. Lu, Y. Hu, and P. C. Loizou, “An algorithm that improves speech intelligibility in noise for normal-hearing listeners,” J. Acoust. Soc. Amer., vol. 126, no. 3, pp. 1486–1494, 2009.
B. Kollmeier and R. Koch, “Speech enhancement based on physiological and psychoacoustical models of modulation perception and binaural interaction,” J. Acoust. Soc. Amer., vol. 95, no. 3, pp. 1593–1602, 1994.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.


