Trans-BILSTM Based Speech and Speaker Recognition using Spectral, Cepstral and Deep Features

Sukumar B. S.

Trans-BILSTM Based Speech and Speaker Recognition using Spectral, Cepstral and Deep Features

Authors

Sukumar B. S., G. N. Kodanda Ramaiah, Sarika Raga, Lalitha Y. S.

Keywords:

Long short term memory (LSTM), Speech Recognition, Speaker Recognition.

Abstract

This paper introduces a new Speech and Speaker Recognition System that utilises a Deep Feature Extraction method with a Heuristic Adopted Transformer Bidirectional Long Short Term Memory (LSTM) and Attention Mechanism. The aim of our approach is to address the difficulties associated with effectively recognising speech and identifying speakers in complex audio data. Our system utilises deep learning methods like bidirectional LSTM and attention mechanism in a Transformer framework to extract temporal and long-range relationships in voice input, improving identification accuracy. We compare the proposed model with existing methods like the flow detection algorithm and reptile search algorithm. The experimental findings reveal that our system surpasses existing algorithms in accuracy, precision, recall, and F1-score, demonstrating its value in Speech and speaker recognition tasks. Our research enhances the area of deep learning-based audio analysis and provides a reliable solution for real-world applications that need precise voice and speaker recognition.

Downloads

Download data is not yet available.

References

J. Santoso, T. Yamada, and S. Makino, “Classification of causes of speech recognition errors using attention-based bidirectional long short-term memory and modulation spectrum,” 2019 Asia-Pacific Signal Inf. Process. Assoc. Annu. Summit Conf. APSIPA ASC 2019, pp. 302–306, 2019, doi: 10.1109/APSIPAASC47483.2019.

M. A. Laskar and R. H. Laskar, “HiLAM-aligned kernel discriminant analysis for text-dependent speaker verification,” Expert Syst. Appl., vol. 182, no. September 2019, p. 115281, 2021, doi: 10.1016/j.eswa.2021.115281.

S. P. Todkar, S. S. Babar, R. U. Ambike, P. B. Suryakar, and J. R. Prasad, “Speaker Recognition Techniques: A Review,” 2018 3rd Int. Conf. Converg. Technol. I2CT 2018, pp. 1–5, 2018, doi: 10.1109/I2CT.2018.8529519

R. Mohd Hanifa, K. Isa, and S. Mohamad, “A review on speaker recognition: Technology and challenges,” Comput. Electr. Eng., vol. 90, no. April 2020, p. 107005, 2021, doi: 10.1016/j.

compeleceng.2021.107005.

S. Novoselov, A. Shulipa, I. Kremnev, A. Kozlov, and V. Schemelinin, “On deep speaker embeddings for text-independent speaker recognition,” Speak. Lang. Recognit. Work. ODYSSEY 2018, pp. 378–385, 2018, doi: 10.21437/Odyssey.2018-53.

A. Hajavi and A. Etemad, “A deep neural network for short-segment speaker recognition,” Proc. Annu. Conf. Int. Speech Commun. Assoc. INTERSPEECH, vol. 2019-Septe, pp. 2878–2882, 2019, doi: 10.21437/Interspeech.2019-2240.

K. Aida-Zade, A. Xocayev, and S. Rustamov, “Speech recognition using Support Vector Machines,” Appl. Inf. Commun. Technol. AICT 2016 - Conf. Proc., vol. 1, 2017, doi: 10.1109/ICAICT.

7991664.

Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. Salakhutdinov, and Q. V. Le, “XLNet: Generalized autoregressive pretraining for language understanding,” Adv. Neural Inf. Process. Syst., vol. 32, no. NeurIPS, pp. 1–11, 2019.

K. Mohiuddin et al., “Retention Is All You Need,” Int. Conf. Inf. Knowl. Manag. Proc., no. Nips, pp. 4752–4758, 2023, doi: 10.1145/3583780.3615497.

J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” NAACL HLT 2019 - 2019 Conf. North Am. Chapter Assoc. Comput. Linguist. Hum. Lang. Technol. - Proc. Conf., vol. 1, no. Mlm, pp. 4171–4186, 2019.

Downloads

Published

26.03.2024

How to Cite

Sukumar B. S. (2024). Trans-BILSTM Based Speech and Speaker Recognition using Spectral, Cepstral and Deep Features. International Journal of Intelligent Systems and Applications in Engineering, 12(21s), 3709 –. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/6121

Download Citation

Issue

Vol. 12 No. 21s (2024)

Section

Research Article

License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.

IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.

Trans-BILSTM Based Speech and Speaker Recognition using Spectral, Cepstral and Deep Features

Authors

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

Similar Articles

Announcements

Information for Authors

ijisae

Information

Indexed By