An Automated DTFWOA-ASI model for Classification and Identification of Speakers utilizing a Metaheuristic Algorithm

Authors

  • T.S. Mullai vendan, R. Thiruvengatanadhan, P. Dhanalakshmi

Keywords:

Speaker identification, Deep Learning, Whale optimization algorithm, VGGish, Spectrograms.

Abstract

Identifying a speaker is a task of classification that aims at recognizing an individual based on sequential data over time. Given that a speech signal manifests as a one-dimensional, continuous time series, most contemporary studies rely on either convolutional neural networks (CNN) or recurrent neural networks (RNN). These techniques have shown effectiveness across various applications, yet efforts to merge these two models for investigating speaker recognition tasks remain unexplored. A spectrogram integrated into a speech signal reveals the voiceprint's spatial attributes, reflecting the voice spectrum. This makes CNN highly suitable for drawing out spatial characteristics, essentially capturing the spectral correlations present in acoustic signatures. Concurrently, with the speech signal being time-sequential, deep RNNs are superior in depicting extended speech compared to more superficial networks. The study introduces a new model named Dual-Tier Feature Extraction with Whale Optimization Algorithm for Automated Speaker Identification (DTFWOA-ASI), designed to address the shortcomings found in earlier models. The DTFWOA-ASI approach is a cutting-edge method designed specifically for the identification of speaker identities. It employs the method of average median filtering (AMF) to remove background noise from sound recordings. Subsequently, the strategy utilizes both MFCC and spectrogram data as inputs into the VGGish model, an advanced deep-learning convolutional network engineered for extracting crucial features. For the fine-tuning of the LSTM-RNN model's hyperparameters, the technique makes use of the Whale Optimization Algorithm (WOA). The approach integrates a long short-term memory network with a recurrent neural network (LSTM-RNN) to enable the automatic identification and classification of speakers. The performance and accuracy of the DTFWOA-ASI framework were thoroughly assessed through several experimental procedures. A comparative analysis highlights the model’s superior performance in comparison to the latest methodologies.

Downloads

Download data is not yet available.

References

Machado, T.J.; Vieira Filho, J.; de Oliveira, M.A. Forensic speaker verification using ordinary least squares. Sensors 2019, 19, 4385.

Wang, Z.; Xia, W.; Hansen, J.H. Cross-domain adaptation with discrepancy minimization for text-independent forensic speaker verification. arXiv 2020, arXiv:2009.02444.

Stefanus, I.; Sarwono, R.J.; Mandasari, M.I. GMM-based automatic speaker verification system development for forensics in Bahasa Indonesia. In Proceedings of the 2017 5th International Conference on Instrumentation, Control, and Automation (ICA),Yogyakarta, Indonesia, 9–11 August 2017; pp. 56–61.

Algabri, M.; Mathkour, H.; Bencherif, M.A.; Alsulaiman, M.; Mekhtiche, M.A. Automatic speaker recognition for mobile forensic applications. Mob. Inf. Syst. 2017, 2017, 6986391.

Gaurav, B.S.; Agarwal, R. An efficient speaker identification framework based on Mask R-CNN classifier parameter optimized using hosted cuckoo optimization (HCO). J. Ambient Intell. Human. Comput. 2022, 13, 1–13.

Susanto, S.; Wang, Z.; Wang, Y.; Nanda, D.S. Forensic Linguistic Inquiry into the Validity of F0 as Discriminatory Potential in the System of Forensic Speaker Verification. J. Forensic Sci. Crim. Investig. 2017, 5, 555664.

Nagrani, A.; Chung, J.S.; Xie, W.; Zisserman, A. Voxceleb: Large-scale speaker verification in the wild. Comput. Speech Lang. 2020, 60, 101027.

Athulya, M.S.; Sathidevi, P.S. Speaker verification from codec distorted speech for forensic investigation through serial combination of classifiers. Digit. Investig. 2018, 25, 70–77.

Hautamäki, R.G.; Sahidullah, M.; Hautamäki, V.; Kinnunen, T. Acoustical and perceptual study of voice disguise by age modification in speaker verification. Speech Commun. 2017, 95, 1–15.

Das, R.K.; Prasanna, S.M. Speaker verification from short utterance perspective: A review. IETE Tech. Rev. 2018, 35, 599–617.

Susanto, S.; Nanda, D.S. December. Analyzing Forensic Speaker Verification by Utilizing Artificial Neural Network. In International Congress of Indonesian Linguistics Society (KIMLI 2021); Atlantis Press: Amsterdam, The Netherlands, 2021; pp. 128–132.

Al-Ali, A.K.H.; Dean, D.; Senadji, B.; Chandran, V.; Naik, G.R. Enhanced forensic speaker verification using a combination of DWT and MFCC feature warping in the presence of noise and reverberation conditions. IEEE Access 2017, 5, 15400–15413.

Huang, S.; Dang, H.; Jiang, R.; Hao, Y.; Xue, C.; Gu, W. Multilayer Hybrid Fuzzy Classification Based on SVM and Improved PSO for Speech Emotion Recognition. Electronics 2021, 10, 2891.

Swain, M.; Maji, B.; Kabisatpathy, P.; Routray, A. A DCRNN-based ensemble classifier for speech emotion recognition in Odia language. Complex Intell. Syst. 2022, 8, 4237–4249.

Mardhotillah, R.; Dirgantoro, B.; Setianingsih, C. Speaker Recognition for Digital Forensic Audio Analysis using Support Vector Machine. In Proceedings of the 2020 3rd International Seminar on Research of Information Technology and Intelligent Systems (ISRITI), Yogyakarta, Indonesia, 10–11 December 2020; pp. 514–519.

Saleem, S.; Subhan, F.; Naseer, N.; Bais, A.; Imtiaz, A. Forensic speaker recognition: A new method based on extracting accent and language information from short utterances. Forensic Sci. Int. Digit. Investig. 2020, 34, 300982.

Khan, F.; Tarimer, I.; Alwageed, H.S.; Karada ˘g, B.C.; Fayaz, M.; Abdusalomov, A.B.; Cho, Y.-I. Effect of Feature Selection on the Accuracy of Music Popularity Classification Using Machine Learning Algorithms. Electronics 2022, 11, 3518.

Snyder, D.; Garcia-Romero, D.; Sell, G.; Povey, D.; Khudanpur, S. X-Vectors: Robust DNN Embeddings for Speaker Recognition. In Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 15–20 April 2018; pp. 5329–5333.

NIST. Speaker Recognition Evaluation 2016. Available online: https://www.nist.gov/itl/iad/mig/speaker-recognition evaluation-2016/ (accessed on 30 July 2020).

Devi, K.J.; Singh, N.H.; Thongam, K. Automatic speaker recognition from speech signals using self-organizing feature map and hybrid neural network. Microprocess. Microsyst. 2020, 79, 103264.

Teixeira, F.; Abad, A.; Raj, B.; Trancoso, I. Towards End-to-End Private Automatic Speaker Recognition. arXiv 2022, arXiv:2206.11750.

Gao, H.; Hu, M.; Gao, T.; Cheng, R. Robust detection of median filtering based on combined features of the difference image.Signal Process. Image Commun. 2019, 72, 126–133.

Ma, Z.; Fokoué, E. Accent Recognition for Noisy Audio Signals. Serdica J. Comput. 2014, 8, 169–182.

Wang, C.; Chen, D.; Hao, L.; Liu, X.; Zeng, Y.; Chen, J.; Zhang, G. Pulmonary image classification based on inception-v3 transfer learning model. IEEE Access 2019, 7, 146533–146541.

Tsalera, Eleni, Andreas Papadakis, and Maria Samarakou. "Comparison of pre-trained CNNs for audio classification using transfer learning." Journal of Sensor and Actuator Networks 10.4 (2021): 72.

Gowrishankar, Bettadamadahally Shivakumaraswamy, and Nagappa U. Bhajantri. "Raga classification using enhanced spatial bound whale optimization algorithm." Indonesian Journal of Electrical Engineering and Computer Science 30.2 (2023): 825.

Zhang, Y.; Xiong, R.; He, H.; Pecht, M.G. Long short-term memory recurrent neural network for remaining useful life prediction of lithium-ion batteries. IEEE Trans. Veh. Technol. 2018, 67, 5695–5705.

https://www.kaggle.com/code/auishikpyne/speaker-identification/ input

Downloads

Published

24.03.2024

How to Cite

T.S. Mullai vendan. (2024). An Automated DTFWOA-ASI model for Classification and Identification of Speakers utilizing a Metaheuristic Algorithm. International Journal of Intelligent Systems and Applications in Engineering, 12(3), 3207–3215. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/5926

Issue

Section

Research Article