Empowering Accented Speech Analysis in Malayalam Through Cutting-Edge Fusion of Self Supervised Learning and Autoencoders


  • Rizwana Kallooravi Thandil Assistant Professor, Sullamussalam Science College, Areekode, Kerala, India
  • Mohamed Basheer K. P. Assistant Professor, Sullamussalam Science College, Areekode, Kerala, India
  • Muneer V. K. Assistant Professor, Sullamussalam Science College, Areekode, Kerala, India


Autoencoders, self-supervised learning, human-computer interface, Accented speech recognition, Malayalam speech recognition


This research explores the application of autoencoders in handling accented speech data for the Malayalam language. The primary objective is to leverage the power of autoencoders to learn a compressed representation of the input data and utilize it to train various machine learning models for improved accuracy rates and reduced word error rates (WER). The study involves a two-step process. Firstly, an autoencoder neural network architecture is employed to encode the accented speech data into a lower-dimensional latent space representation. The encoder network effectively captures the essential features and patterns present in the data. The compressed representation obtained from the encoder is then fed into the decoder, which reconstructs the original input data. In the second step, the encoded model is utilized to train several machine learning models, including logistic regression, decision tree classifier, support vector machine (SVM), random forest classifier(RFC), K-nearest neighbors (KNN), stochastic gradient descent (SGD), and multilayer perceptron (MLP). The encoded features act as inputs to these models, enabling them to learn from the compact representation of the accented speech data. Experimental results indicate that the trained machine learning models, using the encoded features, achieve higher accuracy rates compared to traditional approaches. This improvement in accuracy demonstrates the effectiveness of autoencoders in capturing and representing the significant characteristics of the accented speech data. Moreover, the utilization of the encoded model also leads to lower word error rates, indicating enhanced performance in accurately transcribing and recognizing accented speech in the Malayalam language. This finding showcases the potential of autoencoders in improving the overall accuracy and efficiency of speech-processing tasks for accented languages.


Download data is not yet available.


Sahu, S., Gupta, R., Sivaraman, G., AbdAlmageed, W., & Espy-Wilson, C. Y. (2017). Adversarial Auto-Encoders for Speech Based Emotion Recognition. https://doi.org/10.21437/interspeech.2017-1421

Lee, H., Huang, P., Cheng, Y., & Wang, H. (2022). Chain-based Discriminative Autoencoders for Speech Recognition. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2203.13687

Deng, J., Xu, X., Zhang, Z., Frühholz, S., & Schuller, B. (2018). Semi supervised Autoencoders for Speech Emotion Recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26(1), 31–43. https://doi.org/10.1109/taslp.2017.2759338

Karita, S., Watanabe, S., Iwata, T., Delcroix, M., Ogawa, A., & Nakatani, T. (2019). Semisupervised End-to-end Speech Recognition Using Text-to-speech and Autoencoders. https://doi.org/10.1109/icassp.2019.8682890

Huang, P., Xu, H., Li, J., Baevski, A., Auli, M., Galuba, W., Metze, F., & Feichtenhofer, C. (2022). Masked Autoencoders that Listen. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2207.06405

Atmaja, B. T., & Sasou, A. (2022). Evaluating Self-Supervised Speech Representations for Speech Emotion Recognition. IEEE Access, 10, 124396–124407. https://doi.org/10.1109/access.2022.3225198

Peng, S., Kai, C., Tian, T., & Jingying, C. (2022). An autoencoder-based feature level fusion for speech emotion recognition. Digital Communications and Networks. https://doi.org/10.1016/j.dcan.2022.10.018

Bastanfard, A., & Abbasian, A. (2023). Speech emotion recognition in Persian based on stacked autoencoder by comparing local and global features. Multimedia Tools and Applications. https://doi.org/10.1007/s11042-023-15132-3

Ying, Y., Tu, Y., & Zhou, H. (2021). Unsupervised Feature Learning for Speech Emotion Recognition Based on Autoencoder. Electronics, 10(17), 2086. https://doi.org/10.3390/electronics10172086

Barkani, F., Hamidi, M., Laaidi, N. et al. Amazigh speech recognition based on the Kaldi ASR toolkit. Int. j. inf. tecnol. (2023). https://doi.org/10.1007/s41870-023-01354-z

Abou-Loukh, S. J. . and Abdul-Razzaq, S. M. . (2023) “Isolated Word Speech Recognition Using Mixed Transform”, Journal of Engineering, 19(10), pp. 1271–1286. doi: 10.31026/j.eng.2013.10.06.

Al Dujaili, M.J., Ebrahimi-Moghadam, A. Automatic speech emotion recognition based on hybrid features with ANN, LDA and K_NN classifiers. Multimed Tools Appl (2023). https://doi.org/10.1007/s11042-023-15413-x




How to Cite

Thandil, R. K. ., Basheer K. P., M. ., & V. K., M. (2023). Empowering Accented Speech Analysis in Malayalam Through Cutting-Edge Fusion of Self Supervised Learning and Autoencoders. International Journal of Intelligent Systems and Applications in Engineering, 12(9s), 238–246. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/4269



Research Article