Neural Network-Based Approach for Identification and Classification of Speech Disfluency: The Apraxia of Speech

Authors

  • Ashwini P., S. H. Bharathi

Keywords:

Apraxia of speech, Voice activity, Zero crossing detection, MFCC, CNN classifier

Abstract

Over the past decade, the field of signal processing has witnessed remarkable growth, particularly in speech processing, with a substantial impact from the integration of Artificial Intelligence (AI) and Machine Learning (ML). The focus on AI/ML-based speech processing has notably advanced in the identification of voice disfluencies, particularly within biomedical applications. Given the critical nature of disfluency identification, the range of potential applications is extensive, as these inconsistencies pose challenges to effective human communication. This paper specifically delves into the examination of apraxia of speech, presenting an algorithm designed for its identification—a distinctive form of speech disfluency. The algorithm is built upon a Convolutional Neural Network (CNN) deep neural network, forming the cornerstone of its development for categorizing normal and apraxic speech. Feature extraction involves the utilization of Teager energy operators, encompassing fundamental frequency, Mel-Frequency Cepstral Coefficients (MFCC), short-term zero crossing rate (STZCR), and Teager energy (TEO). Notably, the incorporation of STZCR as a classification parameter significantly enhances the classifier's efficiency compared to TEO. The inclusion of STZCR results in an impressive 89 percent efficiency in speech categorization, whereas TEO alone yields an efficiency of 80 percent. This research underscores the pivotal role of AI/ML-based approaches in addressing speech disfluencies, particularly in the context of apraxia, contributing to advancements in early and accurate identification of communication disorders.

Downloads

Download data is not yet available.

References

K Knollman-Porter. Acquired apraxia of speech: a review. Top Stroke Rehabil, 15(5):484– 93, 2008.

J Ogar, H Slama, N Dronkers, S Amici, and M L Gorno-Tempini. Apraxia of speech: an overview. Neurocase, 11(6):427–459, 2005.

R T Wertz, L L Lapointe, and J C Rosenbek, Apraxia of speech in adults : the disorder and its management, 1984.

Apraxia of Speech and a Case Example - Better Speech, 2022. https://blog. betterspeech.com/post/apraxia-or-apraxia-of-speech-and-a-case-example.

S, Ajibola Alim, and N. Khair Alang Rashid. Some Commonly Used Speech Feature Extrac- tion Algorithms. From Natural to Artificial Intelligence - Algorithms and Applications, 2018.

DNN-based Causal Voice Activity Detector, 2022. https://www.researchgate.net/ publication/315955578_DNN-based_Causal_Voice_Activity_Detector

Pitch detection algorithm: autocorrelation method and AMDF, 2022. https://www. researchgate.net/publication/228854783_Pitch_detection_algorithm_autocorrelation_ method_and_AMDF

D S Shete and S B Patil. Zero crossing rate and Energy of the Speech Signal of Devanagari Script. IOSR Journal of VLSI and Signal Processing (IOSR-JVSP, 4(1), 2014.

Zero Crossing Rate - an overview | ScienceDirect Topics, 2022. https://www.sciencedirect. com/topics/engineering/zero-crossing-rate

M B Er, E Isik, and I Isik. Parkinson’s detection based on combined CNN and LSTM using enhanced speech signals with Variational mode decomposition. Biomedical Signal Process- ing and Control, 70:103006–103006, 2021.

D C C Cire¸san, U Meier, J Masci, L M Gambardella, and J Schmidhuber, High-Performance Neural Networks for Visual Object Classification, 2011.

B Jan. Deep learning in big data Analytics: A comparative study. Computers and Electrical Engineering, 75:275–287, 2019.

C Chen, Z Hua, R Zhang, G Liu, and W Wen. Automated arrhythmia classification based on a combination network of CNN and LSTM. Biomedical Signal Processing and Control, 57, 2020.

M A Little, P E Mcsharry, E J Hunter, J Spielman, and L O Ramig, Suitability of dysphonia measurements for telemonitoring of Parkinson’s disease,” Nature Precedings, 2008.

I Bhattacharya and M P S Bhatia. SVM classification to distinguish Parkinson disease patients. Proceedings of the 1st Amrita ACM-W Celebration of Women in Computing in India, A2CWiC’10, 2010.

A Eshky, UltraSuite: A Repository of Ultrasound and Acoustic Data from Child Speech Therapy Sessions.

Downloads

Published

26.03.2024

How to Cite

S. H. Bharathi, A. P. . (2024). Neural Network-Based Approach for Identification and Classification of Speech Disfluency: The Apraxia of Speech. International Journal of Intelligent Systems and Applications in Engineering, 12(3), 1689–1697. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/5579

Issue

Section

Research Article