An Automated Speech Recognition System Using Whale Optimized Random Forest Algorithm

Authors

  • Urvashi Rawat Assistant Professor, School of Engineering and Computer, Dev Bhoomi Uttarakhand University, Uttarakhand, India
  • Adlin Jebakumari Assistant Professor, Department of Computer Science and IT, Jain (Deemed-to-be University), Bangalore-27, India
  • Ramesh Chandra Tripathi Professor, College of Computing Science and Information Technology, Teerthanker Mahaveer University, Moradabad, Uttar Pradesh, India
  • Purnima Nag Professor, School of Engineering & Technology, Jaipur National University, Jaipur, india

Keywords:

Automated Speech Recognition (ASR), Whale Optimization Algorithm (WOA), Whale Optimized Random Forest Algorithm (WO-RFA), Mel Frequency Cepstral Coefficients (MFCC), natural language processing

Abstract

Automated speech recognition (ASR) systems play a significant role in applications for natural language processing and human-computer interaction. The richness and variety of speech signals, however, make it difficult to improve the ASR systems' accuracy and effectiveness. This study describes an ASR system that uses the Whale Optimized Random Forest Algorithm (WO-RFA) algorithm to improve speech recognition performance. The suggested ASR system combines the strength of random forest algorithms with the optimization capabilities of the Whale Optimization Algorithm (WOA). The WOA is a nature-inspired metaheuristic algorithm based on humpback whale social behavior. It simulates whale hunting behavior to find the best solutions in a given search space. The system intends to improve the accuracy and resilience of speech recognition by incorporating WOA into the random forest algorithm. Noise removal, feature extraction, and normalization techniques are used to preprocess the speech data. This step gets the data ready for training and recognition. Preprocessed speech signals are analyzed to identify relevant information. The characteristics were extracted using MFCC, or Mel Frequency Cepstral Coefficients. To enhance the random forest's performance in voice recognition tasks, the WOA modifies the random forest's parameters and structure. To enhance the model's precision and effectiveness, the method alters hyper parameters such as the number of trees, tree depth, and splitting criteria.  The ASR system incorporates the improved WORF model, allowing it to convert speech inputs into text outputs in real-time. The technology can be used for a variety of purposes, including voice assistants, transcription services, and voice-controlled systems.

Downloads

Download data is not yet available.

References

Rao, P.S., Parida, P., Sahu, G. and Dash, S., 2023. A multi-view human gait recognition using a hybrid whale and gray wolf optimization algorithm with a random forest classifier. Image and Vision Computing, p.104721.

Kaur, B., Rathi, S. and Agrawal, R.K., 2022. Enhanced depression detection from speech using Quantum Whale Optimization Algorithm for feature selection. Computers in Biology and Medicine, 150, p.106122.

Evers, K. and Chen, S., 2022. Effects of an automatic speech recognition system with peer feedback on pronunciation instruction for adults. Computer Assisted Language Learning, 35(8), pp.1869-1889.

Palaz, D., Magimai-Doss, M. and Collobert, R., 2019. End-to-end acoustic modeling using convolutional neural networks for HMM-based automatic speech recognition. Speech Communication, 108, pp.15-32.

Li, S., Dabre, R., Lu, X., Shen, P., Kawahara, T. and Kawai, H., 2019, September. Improving Transformer-Based Speech Recognition Systems with Compressed Structure and Speech Attributes Augmentation. In Interspeech (pp. 4400-4404).

Bhardwaj, V. and Kadyan, V., 2020, October. A deep neural network trained Punjabi children speech recognition system using the Kaldi toolkit. In 2020 IEEE 5th international conference on computing communication and Automation (ICCCA) (pp. 374-378). IEEE.

Yadav, I.C., Shahnawazuddin, S. and Pradhan, G., 2019. Addressing noise and pitch sensitivity of speech recognition system through variational mode decomposition-based spectral smoothing. Digital Signal Processing, 86, pp.55-64.

Tanaka, T., Masumura, R., Moriya, T., Oba, T. and Aono, Y., 2019. A Joint End-to-End and DNN-HMM Hybrid Automatic Speech Recognition System with Transferring Sharable Knowledge. In INTERSPEECH (pp. 2210-2214).

Guglani, J. and Mishra, A.N., 2021. DNN-based continuous speech recognition system of Punjabi language on Kaldi toolkit. International Journal of Speech Technology, 24, pp.41-45.

Chai, L., Du, J., Liu, Q.F. and Lee, C.H., 2020. A cross-entropy-guided measure (CEGM) for assessing speech recognition performance and optimizing DNN-based speech enhancement. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29, pp.106-117.

Wang, D., Wang, X. and Lv, S., 2019. An overview of end-to-end automatic speech recognition. Symmetry, 11(8), p.1018.

Abulimiti, A. and Schultz, T., 2020, May. Automatic speech recognition for Uyghur through multilingual acoustic modeling. In Proceedings of the 12th Language Resources and Evaluation Conference (pp. 6444-6449).

Terhiija, V., Sarmah, P. and Vijaya, S., 2019. Development of Speech Corpus and Automatic Speech Recognition of Angami. 22nd Oriental-COCOSDA, Cebu, Philippines.

Tao, F. and Busso, C., 2020. End-to-end audiovisual speech recognition system with multitask learning. IEEE Transactions on Multimedia, 23, pp.1-11.

Abate, S.T., Tachbelie, M.Y. and Schultz, T., 2020, May. Deep neural networks based automatic speech recognition for four Ethiopian languages. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 8274-8278). IEEE.

Madhavaraj, A. and Ganesan, R.A., 2022. Data and knowledge-driven approaches for multilingual training to improve the performance of speech recognition systems of Indian languages. arXiv preprint arXiv:2201.09494.

Park, H., Kim, C., Son, H., Seo, S. and Kim, J.H., 2022. Hybrid CTC-attention network-based end-to-end speech recognition system for the Korean language. Journal of Web Engineering, pp.265-284.

Valizada, A., Akhundova, N. and Rustamov, S., 2021. Development of Speech Recognition Systems in Emergency Call Centers. Symmetry, 13(4), p.634.

Zhang, W., Cui, X., Finkler, U., Saon, G., Kayi, A., Buyuktosunoglu, A., Kingsbury, B., Kung, D. and Picheny, M., 2019. A highly efficient distributed deep learning system for automatic speech recognition. arXiv preprint arXiv:1907.05701.

Guo, H., Zhou, J., Koopialipoor, M., Jahed Armaghani, D. and Tahir, M.M., 2021. Deep neural network and whale optimization algorithm to assess flyrock induced by blasting. Engineering with Computers, 37, pp.173-186.

Singh, C. ., Gangwar, M. ., & Kumar, U. . (2023). Improving Accuracy of Integrated Neuro-Fuzzy Classifier with FCM based Clustering for Diagnosis of Psychiatric Disorder. International Journal on Recent and Innovation Trends in Computing and Communication, 11(2s), 244–248. https://doi.org/10.17762/ijritcc.v11i2s.6143

Harris, K., Green, L., Perez, A., Fernández, C., & Pérez, C. Exploring Reinforcement Learning for Optimal Resource Allocation. Kuwait Journal of Machine Learning, 1(4). Retrieved from http://kuwaitjournals.com/index.php/kjml/article/view/155

Dhabliya, D., Sharma, R. Cloud computing based mobile devices for distributed computing (2019) International Journal of Control and Automation, 12 (6 Special Issue), pp. 1-4.

Downloads

Published

04.11.2023

How to Cite

Rawat, U. ., Jebakumari, A. ., Tripathi, R. C. ., & Nag, P. . (2023). An Automated Speech Recognition System Using Whale Optimized Random Forest Algorithm. International Journal of Intelligent Systems and Applications in Engineering, 12(3s), 384–390. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/3718

Issue

Section

Research Article