Optimizing Point of Care Interaction with Speech Recognition to Preserve Learnability and Intelligibility

Authors

  • M Narasimha Rao, A Venkata Raju, Matta Venkata Durga Pavan Kumar

Keywords:

Environment-optimized algorithms, speech enhancement, speech intelligibility.

Abstract

Although the majority of speech augmentation algorithms increase voice quality, they may not augment speech intelligibility in noisy environments.    
This work examines the creation of an algorithm that may be tailored to a particular acoustic environment to enhance speech intelligibility. The suggested technique disaggregates the input signal into time-frequency (T-F) units and employs a Bayesian classifier to make binary determinations on whether each T-F unit is predominated by the target signal or the noise masker. Target-dominated time-frequency units are preserved, but masker-dominated time-frequency units are eliminated. The Bayesian classifier is trained for each acoustic environment using an incremental method that perpetually adjusts the model parameters as further data is acquired.

Listening tests were performed to evaluate the intelligibility of speech synthesized using incrementally modified models based on the quantity of training sentences. The results demonstrated significant improvements in intelligibility, exceeding 60% in babbling at a 5 dB signal-to-noise ratio, with a minimum of 10 training phrases in babble and at least 80 words in other loud environments.

Downloads

Download data is not yet available.

References

P. C. Loizou, Speech Enhancement: Theory and Practice. Boca Raton: CRC, 2007.

Y. Hu and P. C. Loizou, “A comparative intelligibility study of singlemicrophone noise reduction algorithms,” J. Acoust. Soc. Amer., vol. 122, pp. 1777–1786, 2007.

J. S. Lim, “Evaluation of a correlation subtraction method for enhancing speech degraded by additive white noise,” IEEE Trans. Acoust. Speech Signal Process., vol. ASSP-26, no. 5, pp. 471–472, Oct. 1978.

J. A. Zakis, H. Dillon, and H. J. McDermott, “The design and evaluation of a hearing aid with trainable amplification parameters,” EarHear., vol. 28, no. 6, pp. 812–830, 2007.

J. E. Porter and S. F. Boll, “Optimal estimators for spectral restoration of noisy speech,” in Proc. Int. Conf. Acoust. Speech Signal Process., 1984, pp. 18A.2.1–18A.2.4.

S. Srinivasan, J. Samuelsson, and W. B. Kleijn, “Codebook driven short-term predictor parameter estimation for speech enhancement,” IEEE Trans. Audio, Speech, Lang. Process., vol. 14, no. 1, pp. 163–176, Jan. 2006.

J. Erkelens, J. Jensen, and R. Heusdens, “A general optimization procedure for spectral speech enhancement methods,” in Proc. Eur. Signal Proc. Conf., Florence, Italy, Sep. 2006.

J. Erkelens, J. Jensen, and R. Heusdens, “A data-driven approach to optimizing spectral speech enhancement methods for various error criteria,” Speech Commun., vol. 49, pp. 530–541, 2007.

T. Fingscheidt and S. Suhadi, “Data-driven speech enhancement,” in Proc. ITG-Fachtagung Sprachkommunikation, Kiel, Germany, 2006.

T. Fingscheidt, S. Suhadi, and S. Stan, “Environment-optimized speech enhancement,” IEEE Trans. Audio, Speech, Lang. Process., vol. 16, no. 4, pp. 825–834, May 2008.

Y. Ephraim and D. Malah, “Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator,” IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-32, no. 6, pp. 1109–1121, Dec. 1984.

T. Lotter and P. Vary, “Speech enhancement by MAP spectral amplitude estimation using a super-Gaussian speech model,” EURASIP J. Appl. Signal Process., vol. 7, pp. 1110–1126, 2005.

C. Bin and P. C. Loizou, “A Laplacian-based MMSE estimator for speech enhancement,” Speech Commun., pp. 134–143, 2007.

P. C. Loizou, “Speech enhancement based on perceptually motivated Bayesian estimators of the magnitude spectrum,” IEEE Trans. Speech Audio Process., vol. 13, no. 5, pp. 857–869, Sep. 2005.

D. Brungart, P. Chang, B. Simpson, and D. Wang, “Isolating the energetic component of speech-on-speech masking with ideal time-frequency segregation,” J. Acoust. Soc. Amer., vol. 120, pp. 4007–4018, 2006.

N. Li and P. C. Loizou, “Factors influencing intelligibility of ideal binary- masked speech: Implications for noise reduction,” J. Acoust. Soc. Amer., vol. 123, no. 3, pp. 1673–1682, 2008.

N. Li and P. C. Loizou, “Effect of spectral resolution on the intelligibility of ideal binary masked speech,” J. Acoust. Soc. Amer., vol. 123, no. 4, pp. 59–64, 2008.

M. Cooke, P. Green, L. Josifovski, and A. Vizinho, “Robust automatic speech recognition with missing and unreliable acoustic data,” Speech Commun., vol. 34, pp. 267–285, 2001.

G. Kim, Y. Lu, Y. Hu, and P. C. Loizou, “An algorithm that improves speech intelligibility in noise for normal-hearing listeners,” J. Acoust. Soc. Amer., vol. 126, no. 3, pp. 1486–1494, 2009.

B. Kollmeier and R. Koch, “Speech enhancement based on physiological and psychoacoustical models of modulation perception and binaural interaction,” J. Acoust. Soc. Amer., vol. 95, no. 3, pp. 1593–1602, 1994.

Downloads

Published

30.10.2024

How to Cite

M Narasimha Rao. (2024). Optimizing Point of Care Interaction with Speech Recognition to Preserve Learnability and Intelligibility. International Journal of Intelligent Systems and Applications in Engineering, 12(4), 5702 –. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/7513

Issue

Section

Research Article