Data Generation for Speech Recognition based on Generative Adversarial Networks

Authors

  • R. Lavanya Assistant Professor, Department of Computing Technologies, SRM Institute of Science and Technology, Kattankulathur
  • K.B. Kishore Mohan Professor & Head, Department of Bio Medical Engineering, Sri Shanmugha College of Engineering and Technology - [SSCET], Sankari, Salem
  • G. Gomathy HOD, EEE Department, Jaya Engineering College, Chennai-24, Thiruvallur District, Tamil Nadu, Chennai
  • Appana Naga Lakshmi Assistant Professor, Artificial Intelligence, Madanapalle Institute of Technology & Science, Madanapalle, A.P
  • R. Salini Assistant Professor, Department of CSE, Panimalar Engineering College, Chennai

Keywords:

Generative Adversarial Networks, Speech Recognition, Speech Generation, SEGAN

Abstract

Individuals who are deaf or dumb are likely to derive extra benefit via a speech recognition system that uses GANs. However distracting outside circumstances, individuals will find it straightforward to grasp the information. The speech enhancement approaches prevalent currently work in the frequency domain and/or take the benefit of higher-level elements. Many of them utilize first-order analytics and only solve a restricted set of noise scenarios. Deep networks are being adopted increasingly to get around these drawbacks as a result of their ability to learn challenging tasks from sizable sample datasets. In this paper, a GAN-based strategy is proposed for generating synthetic data for speech emotion recognition. More specifically, we glance into using GANs for collecting the data stream. We examine the implementation of Generative Adversarial Networks (GANs) for trained data enrichment to yield samples for disproportionately represented emotions. The updated specimens demonstrate the recommended model's viability, and appraisals from both specialists and laypeople encourage its efficacy. In doing so, we start looking into generative architectures for voice enhancements, which may gradually comprise more speech-centric design choices to improve their functionality.

Downloads

Download data is not yet available.

References

Adiga, N., Pantazis, Y., Tsiaras, V., & Stylianou, Y. (2019, September). Speech Enhancement for Noise-Robust Speech Synthesis Using Wasserstein GAN. In INTERSPEECH (pp. 1821-1825).

Wang, K., Zhang, J., Sun, S., Wang, Y., Xiang, F., & Xie, L. (2018). Investigating generative adversarial networks based speech dereverberation for robust speech recognition. arXiv preprint arXiv:1803.10132.

Pascual, S., Bonafonte, A., & Serra, J. (2017). SEGAN: Speech enhancement generative adversarial network. arXiv preprint arXiv:1703.09452.

Qian, Y., Hu, H., & Tan, T. (2019). Data augmentation using generative adversarial networks for robust speech recognition. Speech Communication, 114, 1-9.

Vijay, I., Banwari, H., Saluja, G., & Khatri, A. (2021). IMPROVING SPEECH ENHANCEMENT USING GENERATIVE ADVERSARIAL NETWORKS (SEGAN) BY USING MULTISTAGE-ENHANCEMENT. International Research Journal of Modernization in Engineering Technology and Science, 3(6), 214-217.

Donahue, C., Li, B., & Prabhavalkar, R. (2018, April). Exploring speech enhancement with generative adversarial networks for robust speech recognition. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5024-5028). IEEE.

Sriram, A., Jun, H., Gaur, Y., & Satheesh, S. (2018, April). Robust speech recognition using generative adversarial networks. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5639-5643). IEEE.

Benzeghiba, M., De Mori, R., Deroo, O., Dupont, S., Erbes, T., Jouvet, D., ... & Wellekens, C. (2007). Automatic speech recognition and speech variability: A review. Speech communication, 49(10-11), 763-786.

Chatziagapi, A., Paraskevopoulos, G., Sgouropoulos, D., Pantazopoulos, G., Nikandrou, M., Giannakopoulos, T., ... & Narayanan, S. (2019, September). Data Augmentation Using GANs for Speech Emotion Recognition. In Interspeech (pp. 171-175).

Hu, H., Tan, T., & Qian, Y. (2018, April). Generative adversarial networks based data augmentation for noise robust speech recognition. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5044-5048). IEEE.

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., ... & Bengio, Y. (2014). Generative adversarial nets. Advances in neural information processing systems, 27.

Yi, L., & Mak, M. W. (2020). Improving speech emotion recognition with adversarial data augmentation network. IEEE transactions on neural networks and learning systems, 33(1), 172-184.

Phan, H., McLoughlin, I. V., Pham, L., Chén, O. Y., Koch, P., De Vos, M., & Mertins, A. (2020). Improving GANs for speech enhancement. IEEE Signal Processing Letters, 27, 1700-1704.

Wu, J., Hua, Y., Yang, S., Qin, H., & Qin, H. (2019). Speech enhancement using generative adversarial network by distilling knowledge from statistical method. Applied Sciences, 9(16), 3396.

Ghai, B., Ramanan, B., & Mueller, K. (2019). Does speech enhancement of publicly available data help build robust speech recognition systems?. arXiv preprint arXiv:1910.13488.

Victoria, D. A. H. ., Manikanthan , S. V. ., H R, D. V. ., Wildan, M. A. ., & Kishore, K. H. (2023). Radar Based Activity Recognition using CNN-LSTM Network Architecture. International Journal of Communication Networks and Information Security (IJCNIS), 14(3), 303–312. https://doi.org/10.17762/ijcnis.v14i3.5630 (Original work published December 31, 2022).

Downloads

Published

07.02.2024

How to Cite

Lavanya, R. ., Mohan, K. K. ., Gomathy, G. ., Lakshmi, A. N. ., & Salini, R. . (2024). Data Generation for Speech Recognition based on Generative Adversarial Networks. International Journal of Intelligent Systems and Applications in Engineering, 12(15s), 126–135. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/4724

Issue

Section

Research Article