Enhancing Automatic Speech Recognition System Performance for Punjabi Language through Feature Extraction and Model Optimization

Authors

  • Manoj Devare AIIT-AUM, Amity University, Navi Mumbai, India
  • Manish Thakral AIIT-AUM, Amity University, Navi Mumbai, India,

Keywords:

Automatic Speech Recognition (ASR), Punjabi Language, Feature Extraction, Deep Learning, Acoustic Model, Language Modeling, Phonetics, Data Augmentation, Performance Evaluation, Contextual Features, Mel-Frequency Cepstral Coefficients (MFCCs),, Word Error Rate (WER),, Cross-Entropy Loss and Linguistic Variation

Abstract

Automatic Speech Recognition (ASR) systems have revolutionised human-computer interaction by enabling machines to transcribe spoken language into text. While ASR technology has made significant strides in many languages, it remains a challenge for languages with limited resources and unique phonological characteristics, such as Punjabi. This research paper presents an in-depth investigation into improving the performance of an Automatic Speech Recognition system designed for Punjabi language through the extraction of key features that significantly influence its accuracy and efficiency. The Punjabi language, spoken by millions of people worldwide, presents unique phonetic and linguistic features that can pose challenges for ASR systems. To address these challenges, this study employs advanced feature extraction techniques to capture and represent the distinctive characteristics of Punjabi speech. These techniques aim to identify critical acoustic and linguistic properties that are pivotal for accurate speech recognition.

Downloads

Download data is not yet available.

References

Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(4), 357-366.

Abdel-Hamid, O., Mohamed, A. R., & Jiang, H. (2012). Convolutional neural networks for speech recognition. In 2012 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 4277-4280). IEEE.

Graves, A., Mohamed, A. R., & Hinton, G. (2013). Speech recognition with deep recurrent neural networks. In 2013 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 6645-6649). IEEE.

Amodei, D., Ananthanarayanan, S., Anubhai, R., Bai, J., Battenberg, E., Case, C., ... & Zheng, Z. (2016). Deep speech 2: End-to-end speech recognition in English and Mandarin. In International conference on machine learning (pp. 173-182).

Bahl, L. R., Jelinek, F., & Mercer, R. L. (1983). A maximum likelihood approach to continuous speech recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, (2), 179-190.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. In Advances in neural information processing systems (pp. 30-38).

Ko, T., Peddinti, V., Povey, D., & Khudanpur, S. (2015). Audio augmentation for speech recognition. In Sixteenth annual conference of the international speech communication association.

Jurafsky, D., & Martin, J. H. (2009). Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition. Pearson Education.

Downloads

Published

13.12.2023

How to Cite

Devare, M. ., & Thakral, M. . (2023). Enhancing Automatic Speech Recognition System Performance for Punjabi Language through Feature Extraction and Model Optimization. International Journal of Intelligent Systems and Applications in Engineering, 12(8s), 307–313. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/4122

Issue

Section

Research Article