Efficient Recognition and Classification of Stuttered Speech Signal using Deep Learning Technique
Keywords:
Automated Speech Recognition System (ASSR), Deep Neural Network (DNN), MFCC Feature Extraction, Stuttered Speech Recognition, Speech-to-Text APIAbstract
Speech recognition systems in modern-day devices are a popular feature that has facilitated human-machine interaction. Users need not learn complex programming languages to communicate with their devices and can give commands using their voice to perform multiple tasks. However, its usage is limited if it encounters stutter in the voice input of a person with this disfluency. This work is based on building a system that not only classifies the speech as stuttered or normal but also rectifies the discourse by removing stuttered portions from the signal. It first takes the speaker’s audio as input and performs segmentation on the speech signal to divide it into segments of 300ms. MFCC feature extraction from these segments is done. These features are fed into the model for classification of the audio segment, which is then corrected to give stutter-free audio, along with its text conversion.
Downloads
References
A. Czyzewski, A. Kaczmarek, and B. Kostek, 2003. Intelligent processing of stuttered speech. Journal of Intelligent Information Systems, 21, pp.143-171.
M.A. Anusuya, and S.K. Katti, 2010. Speech recognition by machine, a review. arXiv preprint arXiv:1001.2267.
F. Afroz, and S.G. Koolagudi, 2019. Recognition and Classification of Pauses in Stuttered Speech Using Acoustic Features. In 2019 6th International Conference on Signal Processing and Integrated Networks (SPIN) (pp. 921-926). IEEE.
A.A. Surya, and S.M. Varghese, 2016. Automatic speech recognition system for stuttering disabled persons. International Journal of Control Theory and Applications, 9(43), pp.16-20.
L.S. Chee, O.C. Ai, M. Hariharan, and S. Yaacob, 2009. MFCC based recognition of repetitions and prolongations in stuttered speech using k-NN and LDA. In 2009 IEEE student conference on research and development (SCOReD) (pp. 146-149). IEEE.
https://wiki.aalto.fi/display/ITSP/Deltas+and+Delta-deltas.
P. Arbajian, A. Hajja, Z.W. Raś, and A.A. Wieczorkowska, 2018. Segment-removal based stuttered speech remediation. In New Frontiers in Mining Complex Patterns: 6th International Workshop, NFMCP 2017, Held in Conjunction with ECML-PKDD 2017, Skopje, Macedonia, September 18-22, 2017, Revised Selected Papers 6 (pp. 16-34). Springer International Publishing.
D. Gartzman, 2020. Getting to know the mel spectrogram. Towards Data Science.
K N, V. N., and S P, M. 2016. Detection and Analysis of Stuttered Speech. International Journal of Advanced Research in Electronics and Communication Engineering (IJARECE), 5(4), pp. 952-955.
S. Khara, S. Singh, and D. Vir, 2018. A comparative study of the techniques for feature extraction and classification in stuttering. In 2018 Second International Conference on Inventive Communication and Computational Technologies (ICICCT) (pp. 887-893). IEEE.
https://librosa.org/librosa/generated/librosa.feature.melspectrogram.html.
J. Loy, 2020. How to build your own Neural Network from scratch in Python. https://towardsdatascience.com/how-to-build-your-own-neural-network-from-scratch-in-python-68998a08e4f6
Downloads
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.