Multi-Microphone Speech Dereverberation and Noise Reduction using Long Short-Term Memory Networks

Authors

  • Seema Arote Department of Electronics and Telecommunication Engineering Vishwakarma Institute of Technology, Pune, India
  • Vijay Mane Department of Electronics and Telecommunication Engineering Vishwakarma Institute of Technology, Pune, India
  • Dattaray Bormane Department of Electronics and Telecommunication Engineering AISSMS, College of Engineering, Pune, India
  • Shakil S. Shaikh Department of Electronics and Computer Engineering Pravara Rural Engineering College Loni, India

Keywords:

Reverberation, Dereverberation, Room Impulse Response (RIR), Long Short Term Memory Network (LSTM), Reinforcement Learning (RL), Signal to noise ratio (SNR)

Abstract

In the field of speech signal analysis, deep learning has recently demonstrated substantial advantages. In contrast to the other techniques for dereverberation, traditional speech dereverberation approaches underperform when dealing with severe reverberation, especially when dealing with noise and variations in the source to array distance. This work suggested an enhanced reinforcement learning (RL) as the basis for a novel approach to speech dereverberation and denoising. In this method, speech components that are reverberant and noisy (like the logarithmic spectrum) are mapped with basic coefficients for clear speech by using a Long Short Term Memory (LSTM) that is learned using a clean/distorted speech corpus. The suggested method is intuitive and effective in reducing the distorting effects of reverberation and ambient noise. Numerous experiments demonstrate that the suggested method significantly boosts the predicted speech perception and quality in noisy environments. The source-to-array distance, reverberation time (RT), noise variety, and signal-to-noise ratio (SNR) are just some of the variables that are tested. We find that our technique significantly outperforms the competitor. It consistently outperforms all benchmark techniques and can improve speech quality even in low-SNR environments

Downloads

Download data is not yet available.

References

S. Gul et.al, “Preserving the beamforming effect for spatial cue-based pseudo-binaural dereverberation of a single source”, Computer Speech & Language, Volume 77, 2023, 101445, ISSN 0885-2308,

https://doi.org/10.1016/j.csl.2022.101445.

J. Bruyninckx, “Tuning the office sound masking and the architectonics of office work”, Received 06 Dec 2021, Accepted 22 Dec 2022, Published online: 03 Feb 2023

https://doi.org/10.1080/20551940.2022.2162765

Y. Li, et.al, "A Composite T60 Regression and Classification Approach for Speech Dereverberation," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 1013-1023, 2023,

https://doi.org/10.1109/TASLP.2023.3245423

N. Prodi et.al, “Comparing the effects of scattered and specular sound reflections on speech intelligibility in rooms, Building and Environment”, Volume 228, 2023, 109881, ISSN 0360-1323,

https://doi.org/10.1016/j.buildenv.2022.109881

L. Passos et.al, “Multimodal audio-visual information fusion using canonical-correlated Graph Neural Network for energy-efficient speech enhancement”, Information Fusion, Volume 90, 2023, Pages 1-11, ISSN 1566-2535,

https://doi.org/10.1016/j.inffus.2022.09.006 .

J. Sheeja et.al, “Speech dereverberation and source separation using DNN-WPE and LWPR-PCA”. Neural Comput & Applic (2023). https://doi.org/10.1007/s00521-022-07884-0

Y. Xu, J. Du, L. R. Dai, and C. H. Lee, “An experimental study on speech enhancement based on deep neural networks,” IEEE Signal Processing Letters, vol. 21, no. 1, pp. 65–68, 2013.

D. Wang and J. Chen, “Supervised speech separation based on deep learning: an overview,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 26, no. 10, pp. 1702–1726, 2018.

Z. Jin and D. Wang, “A supervised learning approach to monaural segregation of reverberant speech,” IEEE Trans. Audio Speech Lang. Process., vol. 17, no. 4, pp. 625–638, 2009.

X. Li, J. Li, and Y. Yan, “Ideal ratio mask estimation using deep neural networks for monaural speech segregation in noisy reverberant conditions,” in Proceedings of the INTERSPEECH, pp. 1203–1207, Stockholm, Sweden, August 2017.

M. Kolbaek, D. Yu, Z.-H. Tan et al., “Multitalker speech separation with utterance-level permutation invariant training of deep recurrent neural networks,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 25, no. 10, pp. 1901–1913, 2017.

H. Erdogan, J. R. Hershey, S. Watanabe, and J. L. Roux, “Phase sensitive and recognition-boosted speech separation using deep recurrent neural networks,” in Proceedings of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 708–712, Brisbane, Australia, April 2015.

D. S. Williamson and D. L. Wang, “Speech dereverberation and denoising using complex ratio masks,” in Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5590–5594, New Orleans, LA, USA, March 2017

Z. -Q. Wang and D. Wang, "Deep Learning Based Target Cancellation for Speech Dereverberation," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 28, pp. 941-950, 2020,

https://doi.org/10.1109/TASLP.2020.2975902 .

Hangting Chen, Pengyuan Zhang,"A dual-stream deep attractor network with multi-domain learning for speech dereverberation and separation", Neural Networks, Volume 141,2021,Pages 238-248, ISSN 0893-6080

H Wanget.al, "TeCANet: Temporal-Contextual Attention Network for Environment-Aware Speech Dereverberation", Audio and Speech Processing,arXiv:2103.16849

Y. Fu et.al.,"DESNet: A Multi-Channel Network for Simultaneous Speech Dereverberation, Enhancement and Separation," 2021 IEEE Spoken Language Technology Workshop (SLT), 2021, pp. 857-864, https://doi.org/10.1109/SLT48900.2021.9383604 .

C. Fan et.al," Simultaneous Denoising and Dereverberation Using Deep Embedding Features,Audio and Speech Processing,arXiv:2004.02420,6 Apr 2020

H. Li et.al, "Robust Speech Dereverberation Based on WPE and Deep Learning," 2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2020, pp. 52-56.

Xiao et al. Speech dereverberation for enhancement and recognition using dynamic features constrained deep neural networks and feature adaptation. EURASIP J. Adv. Signal Process. 2016, 4 (2016). https://doi.org/10.1186/s13634-015-0300-4

K. Han, et.al, "Learning Spectral Mapping for Speech Dereverberation and Denoising," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 23, no. 6, pp. 982-992, June 2015, https://doi.org/10.1109/TASLP.2015.2416653 .

Y. Masuyama, et.al., "End-to-End Integration of Speech Recognition, Dereverberation, Beamforming, and Self-Supervised Learning Representation," 2022 IEEE Spoken Language Technology Workshop (SLT), Doha, Qatar, 2023, pp. 260-265,

https://doi.org/10.1109/SLT54892.2023.10023199 .

S. Gul et.al.,Preserving the beamforming effect for spatial cue-based pseudo-binaural dereverberation of a single source, Computer Speech & Language, Volume 77, 2023, 101445, ISSN 0885-2308,

https://doi.org/10.1016/j.csl.2022.101445 .

"IEEE Recommended Practice for Speech Quality Measurements," in IEEE Transactions on Audio and Electroacoustics, vol. 17, no. 3, pp. 225-246, September 1969

Allen, J.B., Berkley, D.A.: Image method for efficiently simulating small-room acoustics. J. Acoust. Soc. Amer. 4(65), 943–950 (1979).

Downloads

Published

25.12.2023

How to Cite

Arote, S. ., Mane, V. ., Bormane, D. ., & Shaikh, S. S. . (2023). Multi-Microphone Speech Dereverberation and Noise Reduction using Long Short-Term Memory Networks. International Journal of Intelligent Systems and Applications in Engineering, 12(2), 01–09. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/4196

Issue

Section

Research Article