Multichannel Speech Dereverberation using Generalized Regression Neural Network
Keywords:
Reverberation, Dereverberation, Room Impulse Response (RIR), General Regression Neural Network (GRNN), Signal to noise ratio (SNR)Abstract
When the sound signal is recorded in a confined room, it gets corrupted by echo and background noise present in room. It also deteriorates the property of the dialogue signal and poses a question for numerous speech-related systems, which includes automatic speech recognition and speaker recognition. The Generalized Regression Neural Network (GRNN), which is a single-pass learning process, is renowned for its capability to quickly train on sparse data sets. In this paper, a GRNN-based approach is implemented, which deals with the unified effects of noisy and reverberant environment. The presented approach encompasses two phases: a preprocessing phase which contains framing and feature extraction and a dereverberation and denoising phase which uses the common regression neural network. The outcome of the suggested approach is verified in noisy circumstances for variations in noise, reverberation time and signal to noise ratios. The result of the experiment shows that the developed method operates superior than the existing technique for the actual quality measures. STOI is increased by 5.93% and PESQ is increased by 64.73%.
Downloads
References
E. L. J. George, S. T. Goverts, J. M. Festen, and T. Houtgast, “Measuring the Effects of Reverberation and Noise on Sentence Intelligibility for Hearing-Impaired Listeners,” J. Speech, Lang. Hear. Res., 2010.
P.A. Naylor and N.D. Gaubitch, Ed., Speech dereverberation. London: Springer, 2010.
J. Benesty, M. M. Sondhi, and Y. A. Huang, Springer Handbook of Speech Processing. New York: Springer-Verlag, 2008.
B. D. Van Veen and K. M. Buckley, “Beamforming: A versatile approach to spatial filtering,” IEEE Acoust., Speech, Signal Process. Mag., vol. 5, no. 2, pp. 4–24, Apr. 1988.
J. Allen, D. Berkley, and J. Blauert, "Multi microphone signal processing technique to remove room reverberation from speech signals," The Journal of the Acoustical Society of America, vol. 62, pp. 912-915, 1977.
K. Lebart, J.M. Boucher, and P.N Denbigh, “A new method based on spectral subtraction for speech dereverberation,” Acta Acoustica, vol. 87, no. 3, pp. 359–366, 2001.
M. Wu and D. Wang, “A two-stage algorithm for one microphone reverberant speech enhancement,” IEEE Trans. Speech Audio Process., vol. 14, no. 3, pp. 774–784, May 2006.
B. Schwartz, S. Gannot, and E. A. P. Habets, “Online speech dereverberation using Kalman filter and EM algorithm,” IEEE/ACM Trans. on Audio, Speech, and Lang. Process., vol. 23, no. 2, pp. 394–406, Feb 2015.
K. Furuya and A. Kataoka, “Robust speech dereverberation using multichannel blind deconvolution with spectral subtraction,” IEEE Trans. on audio, speech, and lang. process., vol. 15, no. 5, pp. 1579–1591, 2007.
Tomohiro Nakatani, Keisuke Kinoshita, and Masato Miyoshi, “Harmonicity-based blind dereverberation for single-channel speech signals”, IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no.1, pp. 80-95, Jan. 2007.
Ina Kodrasi, Stefan Goetze and Simon Doclo, “Regularization for partial multichannel equalization for speech dereverberation,” IEEE Trans. Audio, Speech, Lang. Process., vol. 21, no. 9, pp. 1879-1890, Sept. 2013.
M. Delcroix, T. Hikichi, and M. Miyoshi, “Precise dereverberation using multichannel linear prediction,” IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 2, pp. 430–440, 2007.
M. Delcroix, T. Hikichi, and M. Miyoshi, “Dereverberation and denoising using multichannel linear prediction,” IEEE Trans. on audio, speech, and lang. Process. vol. 15, no. 6, August 2007.
T. Yoshioka, T. Nakatani, and M. Miyoshi, “Integrated speech enhancement method using noise suppression and dereverberation,” IEEE Trans. Audio, Speech, Lang. Process., vol. 17, no. 2, pp. 231–246, 2009.
Ofer Schwartz, Sharon Gannot and E. A. P. Habets, “Multi-microphone speech dereverberation and noise reduction using relative early transfer functions,” IEEE Trans. Audio, Speech, Lang. Process., vol. 23, no. 2, pp. 240-251, Feb. 2015.
Saeed Mosayyebpour, Morteza Esmaeili and T. Aaron Gulliver, “Single microphone early and late reverberation suppression in noisy speech,” IEEE Trans. on Audio, Speech, and Lang. Process., vol. 21, no. 2, pp.322-335, July 2013.
Chengshi Zheng, Renhua Peng, Jian Li, and Xiaodong Li, “A constrained MMSE LP residual estimator for speech dereverberation in noisy environments,” in IEEE Signal Process. Letters, vol. 21, no. 12, pp. 1462-1466, 2014.
E. H. Rothauser, W. D. Chapman, N. Guttman, K. S. Nordby, H. R.Silbiger, G. E. Urbanek, and M. Weinstock, “IEEE recommended practice for speech quality measurements,” IEEE Transactions on Audio Electroacoust, vol. 17, pp. 225-246, 1969.
J. B. Allen and D. A. Berkley, “Image method for efficiently simulating small-room acoustics,” Journal of the Acoustical Society of America, vol. 65, pp. 943-950, 1979.
“ITU-T Perceptual evaluation of speech quality (PESQ), an objective method for end-to-end speech quality assessment of narrowband telephone networks and speech codecs.” Int. Telecomm. Union (ITU-T) Rec., pp. 862, 2001.
C. H. Taal, R. C. Hendriks, R. Heusdens, and J. Jensen, "An algorithm for intelligibility prediction of time-frequency weighted noisy speech," IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, pp. 2125-2136, 2011.
Y. Zhao, Z.-Q. Wang, and D. L. Wang, “A two-stage algorithm for noisy and reverberant speech enhancement,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017, pp. 5580-5584.
Donald F. Specht, “Probabilistic Neural Networks”, Neural Networks, Vol. 3. pp. 109 118, 1990.
Donald F. Specht, “A general regression neural network”, IEEE Trans Neural Networks, 1991; 2(6):568–576.
The MathWorks, Inc. Design generalized regression neural network, help section, http://www.mathworks.com/help/nnet/ref/newgrnn.html.
Y. Xu, J. Du, L.-R. Dai and C.-H. Lee, "An experimental study on speech enhancement based on deep neural networks,” IEEE Signal processing letters, vol. 21, pp. 65–68, 2014.
Y. Wang and D. L. Wang, “Towards scaling up classification-based speech separation,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, pp. 1381–1390, 2013.
F. Weninger, J. R. Hershey, J. Le Roux, and B. Schuller, “Discriminatively trained recurrent neural networks for single-channel speech separation,” in IEEE Global Conference on Signal and Information Processing (GlobalSIP), 2014, pp. 577–581.
P.-S. Huang, M. Kim, M. Hasegawa-Johnson, and P. Smaragdis, “Joint optimization of masks and deep recurrent neural networks for monaural source separation,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 23, pp. 2136–2147, 2015.
K. Han, Y. Wang, D. L. Wang, W. S. Woods, I. Merks, and T. Zhang, “Learning spectral mapping for speech dereverberation and denoising,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 23, pp. 982–992, 2015.
B. Wu, K. Li, M. Yang, and C.-H. Lee, "A reverberation-time-aware approach to speech dereverberation based on deep neural networks," IEEE/ACM Trans. Audio Speech Lang. Proc., vol. 25, pp. 102-111, 2017.
Y. Zhao, Z.-Q. Wang, and D.L. Wang, "A two-stage algorithm for noisy and reverberant speech enhancement," in Proceedings of ICASSP, pp. 5580-5584, 2017.
D. S. Williamson and D. L. Wang, “Time-frequency masking in the complex domain for speech dereverberation and denoising,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 25, pp. 1492–1501, 2017.
Y. Zhao, Z.-Q. Wang, and D. L. Wang, “A two-stage deep learning for noisy and reverberant speech enhancement,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 27, pp. , 2018.
S. R. Jondhale and R. S. Deshpande, “GRNN and KF framework based real time target tracking using PSOC BLE and smartphone," ELSEVIER Ad Hoc Networks, vol.84, pp.19-28,2019.
S. R. Jondhale and R. S. Deshpande, “Kalman Filtering Framework-Based Real Time Target Tracking in Wireless Sensor Networks Using Generalized Regression Neural Networks,” IEEE Sensors Journal, vol. 19, no.1, pp. 224 – 233, 2019.
A. Amrouche and J. M. Rouvaen, “Efficient System for Speech Recognition using General Regression Neural Network,” International Journal of Computer and Information Engineering Vol:2, No:4, pp.1206-12012, 2008
D. L. Wang and J. Chen, “Supervised speech separation based on deep learning: An overview,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, in press, 2018.
J, V. ., Rajalakshmi, R. ., Gracewell, J. J. ., Suganthi, S., Kuppuchamy, R. ., & Ganesh, S. S. . (2023). Deep Featured Adaptive Dense Net Convolutional Neural Network Based Cardiac Risk Prediction in Big Data Healthcare Environment. International Journal on Recent and Innovation Trends in Computing and Communication, 11(2s), 219–229. https://doi.org/10.17762/ijritcc.v11i2s.6065
Taylor, D., Roberts, R., Rodriguez, A., González, M., & Pérez, L. Efficient Course Scheduling in Engineering Education using Machine Learning. Kuwait Journal of Machine Learning, 1(2). Retrieved from http://kuwaitjournals.com/index.php/kjml/article/view/121
Umbarkar, A. M., Sherie, N. P., Agrawal, S. A., Kharche, P. P., & Dhabliya, D. (2021). Robust design of optimal location analysis for piezoelectric sensor in a cantilever beam. Materials Today: Proceedings, doi:10.1016/j.matpr.2020.12.1058 Vadivu, N. S., Gupta, G., Naveed, Q. N., Rasheed, T.,
Downloads
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.