Trans-BILSTM Based Speech and Speaker Recognition using Spectral, Cepstral and Deep Features


  • Sukumar B. S., G. N. Kodanda Ramaiah, Sarika Raga, Lalitha Y. S.


Long short term memory (LSTM), Speech Recognition, Speaker Recognition.


This paper introduces a new Speech and Speaker Recognition System that utilises a Deep Feature Extraction method with a Heuristic Adopted Transformer Bidirectional Long Short Term Memory (LSTM) and Attention Mechanism. The aim of our approach is to address the difficulties associated with effectively recognising speech and identifying speakers in complex audio data. Our system utilises deep learning methods like bidirectional LSTM and attention mechanism in a Transformer framework to extract temporal and long-range relationships in voice input, improving identification accuracy. We compare the proposed model with existing methods like the flow detection algorithm and reptile search algorithm. The experimental findings reveal that our system surpasses existing algorithms in accuracy, precision, recall, and F1-score, demonstrating its value in Speech and speaker recognition tasks. Our research enhances the area of deep learning-based audio analysis and provides a reliable solution for real-world applications that need precise voice and speaker recognition.


Download data is not yet available.


J. Santoso, T. Yamada, and S. Makino, “Classification of causes of speech recognition errors using attention-based bidirectional long short-term memory and modulation spectrum,” 2019 Asia-Pacific Signal Inf. Process. Assoc. Annu. Summit Conf. APSIPA ASC 2019, pp. 302–306, 2019, doi: 10.1109/APSIPAASC47483.2019.

M. A. Laskar and R. H. Laskar, “HiLAM-aligned kernel discriminant analysis for text-dependent speaker verification,” Expert Syst. Appl., vol. 182, no. September 2019, p. 115281, 2021, doi: 10.1016/j.eswa.2021.115281.

S. P. Todkar, S. S. Babar, R. U. Ambike, P. B. Suryakar, and J. R. Prasad, “Speaker Recognition Techniques: A Review,” 2018 3rd Int. Conf. Converg. Technol. I2CT 2018, pp. 1–5, 2018, doi: 10.1109/I2CT.2018.8529519

R. Mohd Hanifa, K. Isa, and S. Mohamad, “A review on speaker recognition: Technology and challenges,” Comput. Electr. Eng., vol. 90, no. April 2020, p. 107005, 2021, doi: 10.1016/j.


S. Novoselov, A. Shulipa, I. Kremnev, A. Kozlov, and V. Schemelinin, “On deep speaker embeddings for text-independent speaker recognition,” Speak. Lang. Recognit. Work. ODYSSEY 2018, pp. 378–385, 2018, doi: 10.21437/Odyssey.2018-53.

A. Hajavi and A. Etemad, “A deep neural network for short-segment speaker recognition,” Proc. Annu. Conf. Int. Speech Commun. Assoc. INTERSPEECH, vol. 2019-Septe, pp. 2878–2882, 2019, doi: 10.21437/Interspeech.2019-2240.

K. Aida-Zade, A. Xocayev, and S. Rustamov, “Speech recognition using Support Vector Machines,” Appl. Inf. Commun. Technol. AICT 2016 - Conf. Proc., vol. 1, 2017, doi: 10.1109/ICAICT.


Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. Salakhutdinov, and Q. V. Le, “XLNet: Generalized autoregressive pretraining for language understanding,” Adv. Neural Inf. Process. Syst., vol. 32, no. NeurIPS, pp. 1–11, 2019.

K. Mohiuddin et al., “Retention Is All You Need,” Int. Conf. Inf. Knowl. Manag. Proc., no. Nips, pp. 4752–4758, 2023, doi: 10.1145/3583780.3615497.

J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” NAACL HLT 2019 - 2019 Conf. North Am. Chapter Assoc. Comput. Linguist. Hum. Lang. Technol. - Proc. Conf., vol. 1, no. Mlm, pp. 4171–4186, 2019.




How to Cite

Sukumar B. S. (2024). Trans-BILSTM Based Speech and Speaker Recognition using Spectral, Cepstral and Deep Features. International Journal of Intelligent Systems and Applications in Engineering, 12(21s), 3709 –. Retrieved from



Research Article

Similar Articles

You may also start an advanced similarity search for this article.