Automatic Speech Recognition System for Low Resource Punjabi Language using Deep Neural Network-Hidden Markov Model (DNN-HMM)

Authors

  • Rajni Sobti Chitkara University Institute of Engineering and Technology, Chitkara University, Punjab, India and University Institute of Engineering & Technology, Panjab University, Chandigarh, India-160014
  • Kalpna Guleria Chitkara University Institute of Engineering and Technology, Chitkara University, Punjab, India
  • Virender Kadyan Speech and Language Research Centre, School of Computer Sciences, University of Petroleum and Energy Studies, Dehradun, Uttrakhand, India-248007

Keywords:

Children Automatic Speech Recognition, Low Resource Language, Punjabi Speech, Data Collection, Deep Neural Networks, DNN-HMM

Abstract

In recent years, speech recognition technology has advanced significantly, enabling seamless human-machine interaction. The majority of these advances, however, have focused on major languages with abundant data and resources, neglecting the rich linguistic diversity inherent in low resource languages. There are unique challenges associated with speech recognition in low resource languages, because there is a lack of comprehensive linguistic resources as well as data. To ensure inclusivity and promote global accessibility, researchers recognize the need to bridge this gap. This article focuses on the development of children ASR in the Punjabi languages, along with the potential benefits that can be gained from addressing this understudied field. For the purpose, the speech data from children have been collected (who speak Punjabi) and collected audio data were then segmented using PRAAT (open source software) followed by transcription of segmented audio files. The feature extraction has been implemented using the MFCC algorithm. The process of acoustic modelling has been implemented using various models which include MONO, Tri1, Tri2 and Tri3. The acoustic model then was trained with DNN-HMM to increase the accuracy of the children's ASR in Punjabi language. The results reveal 83.9% accuracy of children ASR in the Punjabi language. Further, the comparison with the existing models shows that the proposed DNN-HMM model gives better results.

Downloads

Download data is not yet available.

References

Alim, S. A., & Rashid, N. K. A. (2018). Some commonly used speech feature extraction algorithms (pp. 2-19). London, UK: IntechOpen. http://dx.doi.org/10.5772/intechopen.80419

Bawa, P., & Kadyan, V. (2021). Noise robust in-domain children speech enhancement for automatic Punjabi recognition system under mismatched conditions. Applied Acoustics, 175, 107810. https://doi.org/10.1016/j.apacoust.2020.107810

Besacier, L., Barnard, E., Karpov, A., & Schultz, T. (2014). Automatic speech recognition for under-resourced languages: A survey. Speech communication, 56, 85-100. https://doi.org/10.1016/j.specom.2013.07.008

Bhardwaj, V., Kukreja, V., & Singh, A. (2021). Usage of Prosody Modification and Acoustic Adaptation for Robust Automatic Speech Recognition (ASR) System. Rev. d'Intelligence Artif., 35(3), 235-242. https://doi.org/10.18280/ria.350307 |

Bhardwaj, V., & Kukreja, V. (2021). Effect of pitch enhancement in Punjabi children's speech recognition system under disparate acoustic conditions. Applied Acoustics, 177, 107918. https://doi.org/10.1016/j.apacoust.2021.107918

Chohan, M. N., & García, M. I. M. (2019). Phonemic comparison of English and punjabi. International Journal of English Linguistics, 9(4), 347-357. https://doi.org/10.5539/ijel.v9n4p347

Rumelhart, D. E., Hinton, G. E., and Williams, R. J.(1986) “Learning representations by back-propagating errors,” Nature, vol. 323, no. 6088, pp. 533–536.

Deka, B., Chakraborty, J., Dey, A., Nath, S., Sarmah, P., Nirmala, S. R., & Vijaya, S. (2018). Speech corpora of under resourced languages of north-east India. In 2018 Oriental COCOSDA-International Conference on Speech Database and Assessments (pp. 72-77). IEEE. https://doi.org/10.1109/ICSDA.2018.8693038

Dua, M., Aggarwal, R. K., & Biswas, M. (2018). Optimizing integrated features for Hindi automatic speech recognition system. Journal of Intelligent Systems, 29(1), 959-976. https://doi.org/10.1515/jisys-2018-0057

Guglani, J., & Mishra, A. N. (2020). Automatic speech recognition system with pitch dependent features for Punjabi language on KALDI toolkit. Applied Acoustics, 167, 107386. https://doi.org/10.1016/j.apacoust.2020.107386

Gupta, S., Jaafar, J., Ahmad, W. W., & Bansal, A. (2013). Feature extraction using MFCC. Signal & Image Processing: An International Journal, 4(4),

https://doi.org/101-108. 10.5121/sipij.2013.4408

H. Hermansky, “Perceptual linear predictive (PLP) analysis of speech,” Journal of the Acoustical Society of America, Vol-4, Aug, 1990. https://doi.org/10.1121/1.399423 Hasegawa-Johnson, M. A., Jyothi, P., McCloy, D., Mirbagheri, M., Di Liberto, G. M., Das, A., & Lee, A. K. C. (2016). ASR for under-resourced languages from probabilistic transcription. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(1), 50-63. https://doi.org/10.1109/TASLP.2016.2621659

Hasija, T., Kadyan, V., & Guleria, K. (2021, A March). Recognition of Children Punjabi Speech using Tonal Non-Tonal Classifier. In 2021 International Conference on Emerging Smart Computing and Informatics (ESCI) (pp. 702-706). IEEE.

Hasija, T., Kadyan, V., & Guleria, K. (2021, B August). Out Domain Data Augmentation on Punjabi Children Speech Recognition using Tacotron. In Journal of Physics: Conference Series (Vol. 1950, No. 1, p. 012044). IOP Publishing.

Hasija, T., Kadyan, V., Guleria, K., Alharbi, A., Alyami, H., & Goyal, N. (2022). Prosodic feature-based discriminatively trained low resource speech recognition system. Sustainability, 14(2), 614.

Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A. R., Jaitly, N., ... & Kingsbury, B. (2012). Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal processing magazine, 29(6), 82-97. https://doi.org/10.1109/MSP.2012.2205597

https://punjabi.lrc.columbia.edu/?page_id=11 Jyoti Guglani,” Continuous Speech Recognition Of Punjabi Language”, Ph.D Dissertation. Dr. A.P.J. Abdul Kalam Technical University, Lucknow, Uttar Pradesh, 2022.

Kadyan, V. Acoustic Features Optimization for Punjabi Automatic Speech Recognition System. Ph.D. Dissertation, Chitkara University, Rajpura, India, 2018.

Kadyan, V.; Mantri, A.; Aggarwal, R.K. A heterogeneous speech feature vectors generation approach with hybrid hmm classifiers. Int. J. Speech Technol. 2017, 20, 761–769. https://doi.org/10.1007/s10772-017-9446-9

Kherdekar, V. A., & Naik, S. A. Speech Recognition System Approaches, Techniques And Tools For Mathematical Expressions: A Review. International Journal Of Scientific & Technology Research Volume 8, Issue 08, August 2019, ISSN 2277-8616,pp 1255-1263.

https://api.semanticscholar.org/CorpusID:202890136

Kim, C. and Stern, R. M., "Power-normalized cepstral coefficients (PNCC) for robust speech recognition," in Proc. of ICASSP, Vol-1, May, 2012. https://doi.org/10.1109/TASLP.2016.2545928

Lata, S., & Arora, S. (2013, August). Laryngeal tonal characteristics of Punjabi—an experimental study. In 2013 International Conference on Human Computer Interactions (ICHCI) (pp. 1-6). https://doi.org/10.1109/ICHCI-IEEE.2013.6887793

Liu, Z., Wu, Z., Li, T., Li, J., & Shen, C. (2018). GMM and CNN hybrid method for short utterance speaker recognition. IEEE Transactions on Industrial informatics, 14(7), 3244-3252. https://doi.org /10.1109/TII.2018.2799928

Lu X., Li S., and Fujimoto M.(2019), Automatic Speech Recognition. Book chapter, Speech-to-Speech Translation, SpringerBriefs in Computer Science, https://doi.org/10.1007/978-981-15-0595-9_2 Schedl M., Yi-Hsuan Yang, and Perfecto Herrera-Boyer. 2016. Introduction to intelligent music systems and applications. ACM Trans. Intell. Syst. Technol. 8, 2 (Oct. 2016), 17:1–17:8. https://doi.org/10.1145/2991468

Nagano, T., Fukuda, T., Suzuki, M., Kurata, G. (2019). Data augmentation based on vowel stretch for improving children's speech recognition. In 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 502-508. https://doi.org/10.1109/ASRU46091.2019.9003741

Noyes, J. M., Haigh, R., & Starr, A. F. (1989). Automatic speech recognition for disabled people. Applied Ergonomics, 20(4), 293-298.

https://doi.org/10.1016/0003-6870(89)90193-2

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, and others. 2015. Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 3 (Dec. 2015), 211–252. https://doi.org/10.1007/s11263-015-0816-y

Serizel, R., & Giuliani, D. (2017). Deep-neural network approaches for speech recognition with heterogeneous groups of speakers including children. Natural Language Engineering, 23(3), 325-350. https://doi.org/10.1017/S135132491600005X

Shahnawazuddin, S., Adiga, N., Kathania, H.K., Sai, B. T. (2020). Creating speaker independent ASR system through prosody modification based data augmentation. Pattern Recognition Letters, 131: 213-218. https://doi.org/10.1016/j.patrec.2019.12.019

Shivakumar, P.G., Potamianos, A., Lee, S., Narayanan, S. (2014). Improving speech recognition for children using acoustic adaptation and pronunciation modeling. In WOCCI, 15-19.

Shi, L., Ahmad, I., He, Y., & Chang, K. (2018). Hidden Markov model based drone sound recognition using MFCC technique in practical noisy environments. Journal of Communications and Networks, 20(5), 509-518. https://doi.org/10.1109/JCN.2018.000075

Sobti, R., Kadyan, V., & Guleria, K. (2022). Challenges for Designing of Children Speech Corpora: A State-of-the-Art Review. ECS Transactions, 107(1), 9053. https://doi.org/10.1149/10701.9053ecst

Surinderpal Singh Dhanjal,“Speech Analysis And Synthesis Of The Punjabi Language”, Ph.D Dissertation. Thapar University, 2014

Vincent Berment, Methods to computerize “little equipped” languages and groups of languages, Theses, Universite Joseph-Fourier - Grenoble I, May 2004, ´ https://tel.archives-ouvertes.fr/tel-00006313.

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, and others. 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (Feb. 2015), 529–533. https://doi.org/10.1038/nature14236

Yan Liu, Yang Liu, Shenghua Zhong, and Songtao Wu. 2017. Implicit visual learning: Image recognition via dissipative learning model. ACM Trans. Intell. Syst. Technol. 8, 2 (Jan. 2017), 31:1–31:24. https://doi.org/10.1145/2974024

Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, and others. 2016. Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv:1609.08144 (Oct. 2016) https://doi.org/10.48550/arXiv.1609.08144

Zhang, Z., Geiger, J., Pohjalainen, J., Mousa, A. E. D., Jin, W., & Schuller, B. (2018). Deep learning for environmentally robust speech recognition: An overview of recent developments. ACM Transactions on Intelligent Systems and Technology (TIST), 9(5), 1-28. https://doi.org/10.1145/3178115

Zixing Zhang, Nicholas Cummins, and Björn Schuller. 2017. Advanced data exploitation for speech analysis—An overview. IEEE Sign. Process. Mag. 34 (July 2017). https://doi.org/10.1109/MSP.2017.2699358

Speech Recognition — Feature Extraction MFCC & PLP | by Jonathan Hui | Medium Accessed on 21.01.2024

Downloads

Published

24.03.2024

How to Cite

Sobti, R. ., Guleria, K. ., & Kadyan, V. . (2024). Automatic Speech Recognition System for Low Resource Punjabi Language using Deep Neural Network-Hidden Markov Model (DNN-HMM). International Journal of Intelligent Systems and Applications in Engineering, 12(19s), 30–42. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/5042

Issue

Section

Research Article