Enhancing Automatic Speech Recognition System Performance for Punjabi Language through Feature Extraction and Model Optimization
Keywords:
Automatic Speech Recognition (ASR), Punjabi Language, Feature Extraction, Deep Learning, Acoustic Model, Language Modeling, Phonetics, Data Augmentation, Performance Evaluation, Contextual Features, Mel-Frequency Cepstral Coefficients (MFCCs),, Word Error Rate (WER),, Cross-Entropy Loss and Linguistic VariationAbstract
Automatic Speech Recognition (ASR) systems have revolutionised human-computer interaction by enabling machines to transcribe spoken language into text. While ASR technology has made significant strides in many languages, it remains a challenge for languages with limited resources and unique phonological characteristics, such as Punjabi. This research paper presents an in-depth investigation into improving the performance of an Automatic Speech Recognition system designed for Punjabi language through the extraction of key features that significantly influence its accuracy and efficiency. The Punjabi language, spoken by millions of people worldwide, presents unique phonetic and linguistic features that can pose challenges for ASR systems. To address these challenges, this study employs advanced feature extraction techniques to capture and represent the distinctive characteristics of Punjabi speech. These techniques aim to identify critical acoustic and linguistic properties that are pivotal for accurate speech recognition.
Downloads
References
Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(4), 357-366.
Abdel-Hamid, O., Mohamed, A. R., & Jiang, H. (2012). Convolutional neural networks for speech recognition. In 2012 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 4277-4280). IEEE.
Graves, A., Mohamed, A. R., & Hinton, G. (2013). Speech recognition with deep recurrent neural networks. In 2013 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 6645-6649). IEEE.
Amodei, D., Ananthanarayanan, S., Anubhai, R., Bai, J., Battenberg, E., Case, C., ... & Zheng, Z. (2016). Deep speech 2: End-to-end speech recognition in English and Mandarin. In International conference on machine learning (pp. 173-182).
Bahl, L. R., Jelinek, F., & Mercer, R. L. (1983). A maximum likelihood approach to continuous speech recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, (2), 179-190.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. In Advances in neural information processing systems (pp. 30-38).
Ko, T., Peddinti, V., Povey, D., & Khudanpur, S. (2015). Audio augmentation for speech recognition. In Sixteenth annual conference of the international speech communication association.
Jurafsky, D., & Martin, J. H. (2009). Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition. Pearson Education.
Downloads
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.