Enhancing Automatic Speech Recognition with NLP Techniques for Low-Resource Languages

Authors

  • Madhukar Mulpuri, Rajesh Gadipuuri, Souptik Sen

Keywords:

speech, automatic, techniques, NLP, error, utility

Abstract

In this paper, we investigate how sophisticated Natural Language Processing (NLP) tools can be used to enhance Automatic Speech Recognition(ASR) systems for low-resource languages. Working to counter the difficulties arising from limited annotated speech data, we explore NLP methods like data augmentation through adaptors and multilingual modelling to boost ASR. This paper decomposes the mechanisms with which these techniques substantially boost recognition accuracy, especially in low-resource conditions. We show that linguistic patterns, phonetic knowledge and contextual information can bridge the gap in data by employing unsupervised as well as semi-supervised learning techniques. Experimental results on multiple low resource languages demonstrate a significant improvement in terms of word error rate (WER) reduction using these NLP techniques which motivates the utility of such approaches to improve access and accuracy for under-researched linguistic communities. The research lays the groundwork for second- and third-tier ASR development in under-resourced areas.  Experimental evaluations across multiple low-resource languages highlight significant reductions in Word Error Rate (WER), validating the utility and scalability of these approaches. These findings underscore the transformative potential of combining NLP innovations with ASR systems to improve inclusivity and accessibility for under-researched linguistic communities. This research establishes a foundational framework for the development of ASR systems that empower linguistically underserved populations, paving the way for greater linguistic equity and preservation in global communication technologies.

 

Downloads

Download data is not yet available.

References

Raghav, Y. S., Ali, Irfan, and Bari, A. (2014). Multi-objective Nonlinear Programming Problem Approach in Multivariate Stratified Sample Surveys in Case of Non-Response. Journal of Statistical Computation and Simulation, 84(1), 22-36.

Raghav, Y. S., M. Faisal Khan, and Khalil, S. (2017). Multi-objective Optimizations in Multivariate Stratified Sample Surveys under Two-Stage Randomized Response Model. Journal of Mathematical and Computational Science, 7(6), 1074-1089.

Khalil, T. A., Raghav, Y. S., and Badra, N. (2016). Optimal Solution of Multi-Choice Mathematical Programming Problem Using a New Technique. American Journal of Operations Research, 6, 167-172.

Chen, Y., and Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785-794).

Hochreiter, S., and Schmidhuber, J. (2015). Long Short-Term Memory. Neural Computation, 9(8), 1735-1780.

Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and Dean, J. (2013). Distributed Representations of Words and Phrases and Their Compositionality. In Advances in Neural Information Processing Systems (pp. 3111-3119).

Kingma, D. P., and Ba, J. (2014). Adam: A Method for Stochastic Optimization. arXiv preprint arXiv:1412.6980.

Bengio, Y., Courville, A., and Vincent, P. (2013). Representation Learning: A Review and New Perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8), 1798-1828.

He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 770-778).

Glorot, X., and Bengio, Y. (2010). Understanding the Difficulty of Training Deep Feedforward Neural Networks. In Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (pp. 249-256).

Hinton, G., and Salakhutdinov, R. (2012). A Better Way to Pretrain Deep Neural Networks. Advances in Neural Information Processing Systems, 20, 617-624.

Rumelhart, D. E., Hinton, G. E., and Williams, R. J. (2010). Learning Representations by Back-Propagating Errors. Cognitive Modeling, 5(3), 213-225.

Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). ImageNet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems (pp. 1097-1105).

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I. (2017). Attention Is All You Need. In Advances in Neural Information Processing Systems (pp. 5998-6008).

Graves, A., Mohamed, A. R., and Hinton, G. (2013). Speech Recognition with Deep Recurrent Neural Networks. In IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 6645-6649).

Chollet, F. (2017). Xception: Deep Learning with Depthwise Separable Convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 1251-1258).

Cho, K., Merrienboer, B., Bahdanau, D., and Bengio, Y. (2014). On the Properties of Neural Machine Translation: Encoder-Decoder Approaches. arXiv preprint arXiv:1409.1259.

Weston, J., Chopra, S., and Bordes, A. (2014). Memory Networks. arXiv preprint arXiv:1410.3916.

Pennington, J., Socher, R., and Manning, C. D. (2014). GloVe: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 1532-1543).

Sutskever, I., Vinyals, O., and Le, Q. V. (2014). Sequence to Sequence Learning with Neural Networks. In Advances in Neural Information Processing Systems (pp. 3104-3112).

Bahdanau, D., Cho, K., and Bengio, Y. (2015). Neural Machine Translation by Jointly Learning to Align and Translate. arXiv preprint arXiv:1409.0473.

LeCun, Y., Bengio, Y., and Hinton, G. (2015). Deep Learning. Nature, 521(7553), 436-444.

Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., and Amodei, D. (2017). Language Models Are Few-Shot Learners. Advances in Neural Information Processing Systems.

Mikolov, T., Karafiát, M., Burget, L., Cernocký, J., and Khudanpur, S. (2010). Recurrent Neural Network Based Language Model. In Interspeech (pp. 1045-1048).

Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. (2014). Dropout: A Simple Way to Prevent Neural Networks from Overfitting. Journal of Machine Learning Research, 15(1), 1929-1958.

Devlin, J., Chang, M. W., Lee, K., and Toutanova, K. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv preprint arXiv:1810.04805.

Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., and Zettlemoyer, L. (2018). Deep Contextualized Word Representations. In Proceedings of NAACL-HLT (pp. 2227-2237).

Yin, W., Kann, K., Yu, M., and Schütze, H. (2017). Comparative Study of CNN and RNN for Natural Language Processing. arXiv preprint arXiv:1702.01923.

Graves, A. (2012). Supervised Sequence Labelling with Recurrent Neural Networks. Studies in Computational Intelligence, 385, 5-13.

Hinton, G. E., Osindero, S., and Teh, Y. W. (2012). A Fast Learning Algorithm for Deep Belief Nets. Neural Computation, 18(7), 1527-1554.

Downloads

Published

25.12.2019

How to Cite

Madhukar Mulpuri. (2019). Enhancing Automatic Speech Recognition with NLP Techniques for Low-Resource Languages. International Journal of Intelligent Systems and Applications in Engineering, 7(4), 245–254. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/7069

Issue

Section

Research Article