Optimizing Point of Care Interaction with Speech Recognition to Preserve Learnability and Intelligibility

M Narasimha Rao

Authors

M Narasimha Rao, A Venkata Raju, Matta Venkata Durga Pavan Kumar

Keywords:

Environment-optimized algorithms, speech enhancement, speech intelligibility.

Abstract

Although the majority of speech augmentation algorithms increase voice quality, they may not augment speech intelligibility in noisy environments.
This work examines the creation of an algorithm that may be tailored to a particular acoustic environment to enhance speech intelligibility. The suggested technique disaggregates the input signal into time-frequency (T-F) units and employs a Bayesian classifier to make binary determinations on whether each T-F unit is predominated by the target signal or the noise masker. Target-dominated time-frequency units are preserved, but masker-dominated time-frequency units are eliminated. The Bayesian classifier is trained for each acoustic environment using an incremental method that perpetually adjusts the model parameters as further data is acquired.

Listening tests were performed to evaluate the intelligibility of speech synthesized using incrementally modified models based on the quantity of training sentences. The results demonstrated significant improvements in intelligibility, exceeding 60% in babbling at a 5 dB signal-to-noise ratio, with a minimum of 10 training phrases in babble and at least 80 words in other loud environments.

Downloads

Download data is not yet available.

References

P. C. Loizou, Speech Enhancement: Theory and Practice. Boca Raton: CRC, 2007.

Y. Hu and P. C. Loizou, “A comparative intelligibility study of singlemicrophone noise reduction algorithms,” J. Acoust. Soc. Amer., vol. 122, pp. 1777–1786, 2007.

J. S. Lim, “Evaluation of a correlation subtraction method for enhancing speech degraded by additive white noise,” IEEE Trans. Acoust. Speech Signal Process., vol. ASSP-26, no. 5, pp. 471–472, Oct. 1978.

J. A. Zakis, H. Dillon, and H. J. McDermott, “The design and evaluation of a hearing aid with trainable amplification parameters,” EarHear., vol. 28, no. 6, pp. 812–830, 2007.

J. E. Porter and S. F. Boll, “Optimal estimators for spectral restoration of noisy speech,” in Proc. Int. Conf. Acoust. Speech Signal Process., 1984, pp. 18A.2.1–18A.2.4.

S. Srinivasan, J. Samuelsson, and W. B. Kleijn, “Codebook driven short-term predictor parameter estimation for speech enhancement,” IEEE Trans. Audio, Speech, Lang. Process., vol. 14, no. 1, pp. 163–176, Jan. 2006.

J. Erkelens, J. Jensen, and R. Heusdens, “A general optimization procedure for spectral speech enhancement methods,” in Proc. Eur. Signal Proc. Conf., Florence, Italy, Sep. 2006.

J. Erkelens, J. Jensen, and R. Heusdens, “A data-driven approach to optimizing spectral speech enhancement methods for various error criteria,” Speech Commun., vol. 49, pp. 530–541, 2007.

T. Fingscheidt and S. Suhadi, “Data-driven speech enhancement,” in Proc. ITG-Fachtagung Sprachkommunikation, Kiel, Germany, 2006.

T. Fingscheidt, S. Suhadi, and S. Stan, “Environment-optimized speech enhancement,” IEEE Trans. Audio, Speech, Lang. Process., vol. 16, no. 4, pp. 825–834, May 2008.

Y. Ephraim and D. Malah, “Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator,” IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-32, no. 6, pp. 1109–1121, Dec. 1984.

T. Lotter and P. Vary, “Speech enhancement by MAP spectral amplitude estimation using a super-Gaussian speech model,” EURASIP J. Appl. Signal Process., vol. 7, pp. 1110–1126, 2005.

C. Bin and P. C. Loizou, “A Laplacian-based MMSE estimator for speech enhancement,” Speech Commun., pp. 134–143, 2007.

P. C. Loizou, “Speech enhancement based on perceptually motivated Bayesian estimators of the magnitude spectrum,” IEEE Trans. Speech Audio Process., vol. 13, no. 5, pp. 857–869, Sep. 2005.

D. Brungart, P. Chang, B. Simpson, and D. Wang, “Isolating the energetic component of speech-on-speech masking with ideal time-frequency segregation,” J. Acoust. Soc. Amer., vol. 120, pp. 4007–4018, 2006.

N. Li and P. C. Loizou, “Factors influencing intelligibility of ideal binary- masked speech: Implications for noise reduction,” J. Acoust. Soc. Amer., vol. 123, no. 3, pp. 1673–1682, 2008.

N. Li and P. C. Loizou, “Effect of spectral resolution on the intelligibility of ideal binary masked speech,” J. Acoust. Soc. Amer., vol. 123, no. 4, pp. 59–64, 2008.

M. Cooke, P. Green, L. Josifovski, and A. Vizinho, “Robust automatic speech recognition with missing and unreliable acoustic data,” Speech Commun., vol. 34, pp. 267–285, 2001.

G. Kim, Y. Lu, Y. Hu, and P. C. Loizou, “An algorithm that improves speech intelligibility in noise for normal-hearing listeners,” J. Acoust. Soc. Amer., vol. 126, no. 3, pp. 1486–1494, 2009.

B. Kollmeier and R. Koch, “Speech enhancement based on physiological and psychoacoustical models of modulation perception and binaural interaction,” J. Acoust. Soc. Amer., vol. 95, no. 3, pp. 1593–1602, 1994.

Optimizing Point of Care Interaction with Speech Recognition to Preserve Learnability and Intelligibility

Authors

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

Similar Articles

Announcements

Information for Authors

ijisae

Information

Indexed By

Optimizing Point of Care Interaction with Speech Recognition to Preserve Learnability and Intelligibility

Authors

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

Similar Articles

Announcements

Information for Authors

Like, Subscribe and Share This Video

ijisae

Information

Indexed By