Fine-tuning ASR Model Performance on Indian Regional Accents for Accurate Chemical Term Prediction in Audio
Keywords:
Automatic Speech Recognition, ASR models, Chemical term identification in Indian regional accents, Deep Speech, Fine-tuning ASR, Wav2Vec, Whisper, Performance evaluationAbstract
Automatic Speech Recognition (ASR) models have recently become famous for their incredible ability to provide highly accurate transcriptions of human speech. They have been in the radius of further research and development. The study compared three state-of-the-art ASR models: Deep Speech, Wav2Vec, and Whisper. The proposed research has evaluated their performance on a dataset of audio recordings containing chemical terms spoken in various Indian regional accents. This research aims to precisely identify a model with the best accuracy of transcribing chemical terms spoken in Indian regional accents and fine-tune it further for efficient prediction.
Downloads
References
Droua-Hamdani, Ghania, Sid-Ahmed Selouani, and Malika Boudraa. "Speaker-independent ASR for modern standard Arabic: effect of regional accents." International Journal of Speech Technology 15 (2012): 487-493.
Winata, Genta Indra, et al. "Learning fast adaptation on cross-accented speech recognition." arXiv preprint arXiv:2003.01901 (2020).
Vergyri, Dimitra, Lori Lamel, and Jean-Luc Gauvain. "Automatic speech recognition of multiple accented English data." Eleventh Annual Conference of the International Speech Communication Association. 2010.
Tomanek, Katrin, et al. "Residual adapters for parameter-efficient ASR adaptation to atypical and accented speech." arXiv preprint arXiv:2109.06952 (2021).
Hinsvark, Arthur, et al. "Accented speech recognition: A survey." arXiv preprint arXiv:2104.10747 (2021).
Radford, Alec, et al. "Robust speech recognition via large-scale weak supervision." arXiv preprint arXiv:2212.04356 (2022).
Kahn, Jacob, et al. "Libri-light: A benchmark for ASR with limited or no supervision." ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2020.
Baevski, Alexei, et al. "wav2vec 2.0: A framework for self-supervised learning of speech representations." Advances in neural information processing systems 33 (2020): 12449-12460.
Likhomanenko, Tatiana, et al. "Rethinking evaluation in ASR: Are our models robust enough?." arXiv preprint arXiv:2010.11745 (2020).
Perero-Codosero, Juan M., et al. "Exploring Open-Source Deep Learning ASR for Speech-to-Text TV program transcription." IberSPEECH. 2018.
Gao, Dongji, et al. "EURO: ESPnet Unsupervised ASR Open-source Toolkit." arXiv preprint arXiv:2211.17196 (2022).
Lee, K-F., H-W. Hon, and Raj Reddy. "An overview of the SPHINX speech recognition system." IEEE Transactions on Acoustics, Speech, and Signal Processing 38.1 (1990): 35-45.
Chan, William, et al. "Listen, attend and spell." arXiv preprint arXiv:1508.01211 (2015).
Hannun, Awni, et al. "Deep Speech: Scaling up end-to-end speech recognition." arXiv preprint arXiv:1412.5567 (2014).
Ravanelli, Mirco, et al. "SpeechBrain: A general-purpose speech toolkit." arXiv preprint arXiv:2106.04624 (2021).
Wang, Changhan, et al. "fairseq s2t: Fast speech-to-text modeling with fairseq." arXiv preprint arXiv:2010.05171 (2020).
Li-Wei Chen, Alexander Rudnicky “Exploring WAV2VEC2.0 Fine Tuning for Improved Speech Emotion Recognition”
Gulati, M. ., Yadav, R. K. ., & Tewari, G. . (2023). Physiological Conditions Monitoring System Based on IoT. International Journal on Recent and Innovation Trends in Computing and Communication, 11(4s), 199–202. https://doi.org/10.17762/ijritcc.v11i4s.6514
Dhabliya, D. (2021). An Integrated Optimization Model for Plant Diseases Prediction with Machine Learning Model . Machine Learning Applications in Engineering Education and Management, 1(2), 21–26. Retrieved from http://yashikajournals.com/index.php/mlaeem/article/view/15
Downloads
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.