Fine-tuning ASR Model Performance on Indian Regional Accents for Accurate Chemical Term Prediction in Audio

Sonali  Kothari; Shwetambari  Chiwhane; Rithwik  Satya; Md. Asad  Ansari; Shreeja  Mehta; Pranav  Naranatt; M.  Karthikeyan

Authors

Sonali Kothari Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune, India
Shwetambari Chiwhane Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune, India
Rithwik Satya Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune, India
Md. Asad Ansari Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune, India
Shreeja Mehta Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune, India
Pranav Naranatt Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune, India
M. Karthikeyan NCL-CSIR, Baner, Pune

Keywords:

Automatic Speech Recognition, ASR models, Chemical term identification in Indian regional accents, Deep Speech, Fine-tuning ASR, Wav2Vec, Whisper, Performance evaluation

Abstract

Automatic Speech Recognition (ASR) models have recently become famous for their incredible ability to provide highly accurate transcriptions of human speech. They have been in the radius of further research and development. The study compared three state-of-the-art ASR models: Deep Speech, Wav2Vec, and Whisper. The proposed research has evaluated their performance on a dataset of audio recordings containing chemical terms spoken in various Indian regional accents. This research aims to precisely identify a model with the best accuracy of transcribing chemical terms spoken in Indian regional accents and fine-tune it further for efficient prediction.

Downloads

Download data is not yet available.

References

Droua-Hamdani, Ghania, Sid-Ahmed Selouani, and Malika Boudraa. "Speaker-independent ASR for modern standard Arabic: effect of regional accents." International Journal of Speech Technology 15 (2012): 487-493.

Winata, Genta Indra, et al. "Learning fast adaptation on cross-accented speech recognition." arXiv preprint arXiv:2003.01901 (2020).

Vergyri, Dimitra, Lori Lamel, and Jean-Luc Gauvain. "Automatic speech recognition of multiple accented English data." Eleventh Annual Conference of the International Speech Communication Association. 2010.

Tomanek, Katrin, et al. "Residual adapters for parameter-efficient ASR adaptation to atypical and accented speech." arXiv preprint arXiv:2109.06952 (2021).

Hinsvark, Arthur, et al. "Accented speech recognition: A survey." arXiv preprint arXiv:2104.10747 (2021).

Radford, Alec, et al. "Robust speech recognition via large-scale weak supervision." arXiv preprint arXiv:2212.04356 (2022).

Kahn, Jacob, et al. "Libri-light: A benchmark for ASR with limited or no supervision." ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2020.

Baevski, Alexei, et al. "wav2vec 2.0: A framework for self-supervised learning of speech representations." Advances in neural information processing systems 33 (2020): 12449-12460.

Likhomanenko, Tatiana, et al. "Rethinking evaluation in ASR: Are our models robust enough?." arXiv preprint arXiv:2010.11745 (2020).

Perero-Codosero, Juan M., et al. "Exploring Open-Source Deep Learning ASR for Speech-to-Text TV program transcription." IberSPEECH. 2018.

Gao, Dongji, et al. "EURO: ESPnet Unsupervised ASR Open-source Toolkit." arXiv preprint arXiv:2211.17196 (2022).

Lee, K-F., H-W. Hon, and Raj Reddy. "An overview of the SPHINX speech recognition system." IEEE Transactions on Acoustics, Speech, and Signal Processing 38.1 (1990): 35-45.

Chan, William, et al. "Listen, attend and spell." arXiv preprint arXiv:1508.01211 (2015).

Hannun, Awni, et al. "Deep Speech: Scaling up end-to-end speech recognition." arXiv preprint arXiv:1412.5567 (2014).

Ravanelli, Mirco, et al. "SpeechBrain: A general-purpose speech toolkit." arXiv preprint arXiv:2106.04624 (2021).

Wang, Changhan, et al. "fairseq s2t: Fast speech-to-text modeling with fairseq." arXiv preprint arXiv:2010.05171 (2020).

Li-Wei Chen, Alexander Rudnicky “Exploring WAV2VEC2.0 Fine Tuning for Improved Speech Emotion Recognition”

Gulati, M. ., Yadav, R. K. ., & Tewari, G. . (2023). Physiological Conditions Monitoring System Based on IoT. International Journal on Recent and Innovation Trends in Computing and Communication, 11(4s), 199–202. https://doi.org/10.17762/ijritcc.v11i4s.6514

Dhabliya, D. (2021). An Integrated Optimization Model for Plant Diseases Prediction with Machine Learning Model . Machine Learning Applications in Engineering Education and Management, 1(2), 21–26. Retrieved from http://yashikajournals.com/index.php/mlaeem/article/view/15

Fine-tuning ASR Model Performance on Indian Regional Accents for Accurate Chemical Term Prediction in Audio

Authors

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

Announcements

Information for Authors

ijisae

Information

trindex