Data Generation for Speech Recognition based on Generative Adversarial Networks
Keywords:
Generative Adversarial Networks, Speech Recognition, Speech Generation, SEGANAbstract
Individuals who are deaf or dumb are likely to derive extra benefit via a speech recognition system that uses GANs. However distracting outside circumstances, individuals will find it straightforward to grasp the information. The speech enhancement approaches prevalent currently work in the frequency domain and/or take the benefit of higher-level elements. Many of them utilize first-order analytics and only solve a restricted set of noise scenarios. Deep networks are being adopted increasingly to get around these drawbacks as a result of their ability to learn challenging tasks from sizable sample datasets. In this paper, a GAN-based strategy is proposed for generating synthetic data for speech emotion recognition. More specifically, we glance into using GANs for collecting the data stream. We examine the implementation of Generative Adversarial Networks (GANs) for trained data enrichment to yield samples for disproportionately represented emotions. The updated specimens demonstrate the recommended model's viability, and appraisals from both specialists and laypeople encourage its efficacy. In doing so, we start looking into generative architectures for voice enhancements, which may gradually comprise more speech-centric design choices to improve their functionality.
Downloads
References
Adiga, N., Pantazis, Y., Tsiaras, V., & Stylianou, Y. (2019, September). Speech Enhancement for Noise-Robust Speech Synthesis Using Wasserstein GAN. In INTERSPEECH (pp. 1821-1825).
Wang, K., Zhang, J., Sun, S., Wang, Y., Xiang, F., & Xie, L. (2018). Investigating generative adversarial networks based speech dereverberation for robust speech recognition. arXiv preprint arXiv:1803.10132.
Pascual, S., Bonafonte, A., & Serra, J. (2017). SEGAN: Speech enhancement generative adversarial network. arXiv preprint arXiv:1703.09452.
Qian, Y., Hu, H., & Tan, T. (2019). Data augmentation using generative adversarial networks for robust speech recognition. Speech Communication, 114, 1-9.
Vijay, I., Banwari, H., Saluja, G., & Khatri, A. (2021). IMPROVING SPEECH ENHANCEMENT USING GENERATIVE ADVERSARIAL NETWORKS (SEGAN) BY USING MULTISTAGE-ENHANCEMENT. International Research Journal of Modernization in Engineering Technology and Science, 3(6), 214-217.
Donahue, C., Li, B., & Prabhavalkar, R. (2018, April). Exploring speech enhancement with generative adversarial networks for robust speech recognition. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5024-5028). IEEE.
Sriram, A., Jun, H., Gaur, Y., & Satheesh, S. (2018, April). Robust speech recognition using generative adversarial networks. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5639-5643). IEEE.
Benzeghiba, M., De Mori, R., Deroo, O., Dupont, S., Erbes, T., Jouvet, D., ... & Wellekens, C. (2007). Automatic speech recognition and speech variability: A review. Speech communication, 49(10-11), 763-786.
Chatziagapi, A., Paraskevopoulos, G., Sgouropoulos, D., Pantazopoulos, G., Nikandrou, M., Giannakopoulos, T., ... & Narayanan, S. (2019, September). Data Augmentation Using GANs for Speech Emotion Recognition. In Interspeech (pp. 171-175).
Hu, H., Tan, T., & Qian, Y. (2018, April). Generative adversarial networks based data augmentation for noise robust speech recognition. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5044-5048). IEEE.
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., ... & Bengio, Y. (2014). Generative adversarial nets. Advances in neural information processing systems, 27.
Yi, L., & Mak, M. W. (2020). Improving speech emotion recognition with adversarial data augmentation network. IEEE transactions on neural networks and learning systems, 33(1), 172-184.
Phan, H., McLoughlin, I. V., Pham, L., Chén, O. Y., Koch, P., De Vos, M., & Mertins, A. (2020). Improving GANs for speech enhancement. IEEE Signal Processing Letters, 27, 1700-1704.
Wu, J., Hua, Y., Yang, S., Qin, H., & Qin, H. (2019). Speech enhancement using generative adversarial network by distilling knowledge from statistical method. Applied Sciences, 9(16), 3396.
Ghai, B., Ramanan, B., & Mueller, K. (2019). Does speech enhancement of publicly available data help build robust speech recognition systems?. arXiv preprint arXiv:1910.13488.
Victoria, D. A. H. ., Manikanthan , S. V. ., H R, D. V. ., Wildan, M. A. ., & Kishore, K. H. (2023). Radar Based Activity Recognition using CNN-LSTM Network Architecture. International Journal of Communication Networks and Information Security (IJCNIS), 14(3), 303–312. https://doi.org/10.17762/ijcnis.v14i3.5630 (Original work published December 31, 2022).
Downloads
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.