Leveraging Intelligent Voice Activity Detection to Elevate Speech Recognition Systems

Authors

  • Anita Mundra Takshila Institute of Engineering and Technology, Jabalpur (MP).
  • N. Bargavi Assistant Professor (Senior Grade), Faculty of Management, SRM Institute of Science and Technology, Vadapalani Campus, Chennai.
  • Ritesh Sharma Assistant Professor, GLA University, Mathura, Uttar Pradesh.
  • Renu Vij Associate Professor, University School of Business, Apex Institute of Technology (MBA), Chandigarh University, Gharun, Mohali, India
  • Melanie Lourens Deputy Dean Faculty of Management Sciences, Durban University of Technology, South Africa
  • Charanjit Singh Assistant Professor, Department of Computer Science & Engineering, MM Engineering College, Maharishi Markandeshwar (Deemed to be University), Mullana-Ambala, Haryana, India 133207.
  • Anoop Beri Professor, School of Education, Lovely Professional University, Phagwara.

Keywords:

Speech Presence Probability, Mel Cepstrum Frequency Coefficients, Perceptual Linear Prediction, Short-time Fourier Transform, Automatic Speech Recognition

Abstract

Research is conducted on Automatic Speech Recognition (ASR) that is practical for use in noisy conditions. The effectiveness of common parameterization strategies was evaluated in comparison to the background signal in terms of lustiness. For Mel frequency cepstral coefficients (MFCC), Perceptual linear predictive (PLP) coefficients, and their modified versions, a hybrid feature extractor is employed by merging the basic blocks of PLP and MFCC. Only during the training phase of the ASR method was the VAD-based frame dropping formula used. The benefit of using this technique is that it eliminates pauses and possibly severely distorted speech parts, which helps with more accurate phone modeling. The analysis and contribution of the modified vocal activity detection technique are the focus of the second part.

Downloads

Download data is not yet available.

References

Qi Li, "Robust Endpoint Detection and Energy Normalization for Real-Time Speech and Speaker Recognition", IEEE Transactions on Speech And Audio Processing, Vol. 10, No. 3, March 2002, pp. 146-157

Xiaodong Cui, "Noise Robust Speech Recognition Using Feature Compensation Based on Polynomial Regression of Utterance SNR", IEEE Transactions On Speech And Audio Processing, Vol. 13, No. 6, November 2005, pp. 1161-1172

Kapil Sharma, "Comparative Study of Speech Recognition System Using Various Feature Extraction Techniques", International Journal of Information Technology and Knowledge Management July-December 2010, Volume 3, No. 2, pp. 695-698

Tomas Dekens, "Improved Speech Recognition In Noisy Environments By Using A Throat Microphone For Accurate Voicing Detection", 18th European Signal Processing Conference (EUSIPCO-2010), pp. 1978-1982

Sami Keronen, "Comparison of Noise Robust Methods In Large Vocabulary Speech Recognition", 18th European Signal Processing Conference (EUSIPCO-2010), pp. 1973-1977

M. G. Sumithra, "Speech Recognition In Noisy Environment Using Different Feature Extraction Techniques", International Journal of Computational Intelligence & Telecommunication Systems, 2(1), 2011, pp. 57-62

Lamia BOUAFIF, "A Speech Tool Software for Signal Processing Applications", 6th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT), 2012, IEEE, pp. 788-791

Md. Mahfuzur Rahman, "Performance Evaluation of CMN for Mel-LPC based Speech Recognition in Different Noisy Environments", International Journal of Computer Applications (0975 – 8887) Volume 58– No.10, November 2012, pp. 6-10

Stephen J. Wright, "Optimization Algorithms and Applications for Speech and Language Processing", IEEE Transactions on Audio, Speech, And Language Processing, Vol. 21, No. 11, November 2013, pp. 2231-2243

Namrata Dave, "Feature Extraction Methods LPC, PLP and MFCC In Speech Recognition", International Journal For Advance Research In Engineering And Technology, Volume 1, Issue VI, July 2013, pp. 1-5

Eric W. Healy, "An algorithm to improve speech recognition in noise for hearing-impaired listeners", J. Acoust. Soc. Am. 134 (4), October 2013, pp. 3029-3038

Deividas Eringis, "Improving Speech Recognition Rate through Analysis Parameters", doi: 10.2478/ecce-2014-0009, pp. 61-66

Jürgen T. Geiger, "Memory-Enhanced Neural Networks and NMF for Robust ASR", IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 22, No. 6, June 2014, Pp. 1037-1046

Taejin Park, "Noise robust feature for automatic speech recognition based on Mel-spectrogram gradient histogram", 2nd Workshop on Speech, Language and Audio in Multimedia (SLAM 2014) Penang, Malaysia September 11-12, 2014, pp. 67-71

Roger Hsiao, "Robust Speech Recognition In Unknown Reverberant And Noisy Conditions", 2015 IEEE, pp. 533-538

Colleen G. Le Prell, "Effects of noise on speech recognition: Challenges for communication by service members", www.elsevier.com/locate/heares, Hearing Research 349 (2017), pp. 76-89

Ashrf Nasef , "Optimization Of The Speaker Recognition In Noisy Environments Using A Stochastic Gradient Descent", International Scientific Conference On Information Technology And Data Related Research, Sinteza 2017, pp. 369-373

Raviraj Joshi, Venkateshan Kannan, "Attention based end to end Speech Recognition for Voice Search in Hindi and English", ACM ISBN 978-1-4503, https://doi.org/10.1145/nnnnnnn.nnnnnnn, 2021

Downloads

Published

13.12.2023

How to Cite

Mundra, A. ., Bargavi, N. ., Sharma, R. ., Vij, R. ., Lourens, M. ., Singh, C. ., & Beri, A. . (2023). Leveraging Intelligent Voice Activity Detection to Elevate Speech Recognition Systems. International Journal of Intelligent Systems and Applications in Engineering, 12(8s), 265–270. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/4117

Issue

Section

Research Article

Most read articles by the same author(s)

Similar Articles

You may also start an advanced similarity search for this article.