Optimizing Speech Synthesis for Efficient Text-to-Speech Conversion with Enhanced Robustness and Resource Efficiency

Authors

  • Mukta Sandhu Skill Department of Computer Science and Engineering, Shri Vishwarkarma Skill University, Palwal, India

Keywords:

Text-To-Speech Conversion, Linguistic Analysis, Prosody Prediction, Butterfly Optimization, Convolutional Neural Network, Waveform Generation

Abstract

Speech synthesis is conversion of text to speech but it become very challenging when there is background noise. Additionally, it is very time-consuming, has high cost, and more power-consuming. To overcome these issues design a Butterfly-based Convolutional Neural System (BbCNS). Initially, the input text of certain users was collected and trained into the system and the preprocessing is utilized for removing the errors present in the dataset and preparing the text data for a specific context. Additionally, data normalization is employed to transfer the text into canonical and consistent form. Additionally, linguistic analysis is used to understand the content of the text and to identify the constituent morphemes of each word. Furthermore, Prosodic prominence prediction can be predicted from written language. Finally, the waveform is generated for converting text into speech. At last, the outcomes that are gained from the model that is designed are validated using other prevailing models with respect to accuracy, sensitivity, specificity, precision, and computation time.

Downloads

Download data is not yet available.

References

ARUL, VH, and RAMALATHA MARIMUTHU. "Speech recognition using Taylor-gradient Descent political optimization based Deep residual network." Computer Speech & Language (2022): 101442.

Agarwal, Gaurav, and Hari Om. "Performance of deer hunting optimization based deep learning algorithm for speech emotion recognition." Multimedia Tools and Applications 80.7 (2021): 9961-9992.

Rathod, Vasundhara S., Ashish Tiwari, and Omprakash G. Kakde. "Wading corvus optimization based text generation using deep CNN and BiLSTM classifiers." Biomedical Signal Processing and Control 78 (2022): 103969.

Gantayat, Harikrushna, Trilochan Panigrahi, and Pradyumna Patra. "An efficient direction‐of‐arrival estimation of multipath signals with impulsive noise using satin bowerbird optimization‐based deep learning neural network." Expert Systems (2022): e13108.

Koteswararao, Yannam Vasantha, and C. B. Rao. "Multichannel speech separation using hybrid GOMF and enthalpy-based deep neural networks." Multimedia Systems 27.2 (2021): 271-286.

Bai, Zhongxin, Xiao-Lei Zhang, and Jingdong Chen. "Partial AUC optimization based deep speaker embeddings with class-center learning for text-independent speaker verification." ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2020.

Kothadiya, Deep, Nitin Pise, and Mangesh Bedekar. "Different Methods Review for Speech to Text and Text to Speech Conversion." International Journal of Computer Applications 975: 8887.

Dong, Mingyu, Diqun Yan, and Rangding Wang. "Adversarial Privacy Protection on Speech Enhancement." arXiv preprint arXiv:2206.08170 (2022).

Srivastava, Nidhi, and Sipi Dubey. "Moth Monarch Optimization-Based Deep Belief Network in Deception Detection System." Sādhanā 45.1 (2020): 1-14.

Yuan, Nanqi, et al. "Laplacian Eigenmaps Feature Conversion and Particle Swarm Optimization-Based Deep Neural Network for Machine Condition Monitoring." Applied Sciences 8.12 (2018): 2611.

Ren, Yi, et al. "Fastspeech: Fast, robust and controllable text to speech." Advances in Neural Information Processing Systems 32 (2019).

Huang, Wen-Chin, et al. "Voice transformer network: Sequence-to-sequence voice conversion using transformer with text-to-speech pretraining." arXiv preprint arXiv:1912.06813 (2019).

Zhang, Mingyang, et al. "Joint training framework for text-to-speech and voice conversion using multi-source tacotron and wavenet." arXiv preprint arXiv:1903.12389 (2019).

Ren, Yi, et al. "Fastspeech 2: Fast and high-quality end-to-end text to speech." arXiv preprint arXiv:2006.04558 (2020).

Raiyetunbi, Oladimeji Jude, and Ayeh Emmanuel. "An Interactive Cloud Based User Oriented, Dynamic and Intelligent Text-To-Speech Module." East African Scholars Journal of Engineering and Computer Sciences 3.1 (2020).

SRMIST, Vadapalani, and U. G. Student. "Text-to-speech device for visually impaired people." International Journal of Pure and Applied Mathematics 119.15 (2018): 1061-1067.

Chakladar, Debashis Das, Pradeep Kumar, Shubham Mandal, Partha Pratim Roy, Masakazu Iwamura and Byung-Gyu Kim "3D Avatar Approach for Continuous Sign Movement Using Speech/Text." Applied Sciences 11.8 (2021): 3439.

Manikandan, K., Ayush Patidar, Pallav Walia and Aneek Barman Roy, "Hand gesture detection and conversion to speech and text." arXiv preprint arXiv:1811.11997 (2018).

Anggraini, Nenny, Luh Kesuma Wardhani, Nashrul Hakiem, "Speech recognition application for the speech impaired using the android-based google cloud speech API." TELKOMNIKA (Telecommunication Computing Electronics and Control) 16.6 (2018): 2733-2739.

Zerrouki, Taha, "Adapting espeak to Arabic language: Converting Arabic text to speech language using espeak." International Journal of Reasoning-Based Intelligent Systems 11.1 (2019): 76-89.

Chen, S.H., 2000. A Corpus-Based Prosodic Modeling Method for Mandarin and Min-Nan Text-to-Speech Conversions. ISCSLP.

Rajbongshi, A., Islam, M.I., Biswas, A.A., Rahman, M.M., Majumder, A. and Islam, M.E., 2020. Bangla optical character recognition and text-to-speech conversion using raspberry Pi. International Journal of Advanced Computer Science and Applications, 11(6).

Santra, S., Bhowmick, S., Paul, A., Chatterjee, P. and Deyasi, A., 2018, May. Development of GUI for text-to-speech recognition using natural language processing. In 2018 2nd International Conference on Electronics, Materials Engineering & Nano-Technology (IEMENTech) (pp. 1-4). IEEE.

Nagdewani, S. and Jain, A., 2020. A REVIEW ON METHODS FOR SPEECH-TO-TEXT AND TEXT-TO-SPEECH CONVERSION.

Talman, A., Suni, A., Celikkanat, H., Kakouros, S., Tiedemann, J. and Vainio, M., 2019. Predicting prosodic prominence from text with pre-trained contextualized word representations. arXiv preprint arXiv:1908.02262.

Downloads

Published

24.03.2024

How to Cite

Sandhu , M. . (2024). Optimizing Speech Synthesis for Efficient Text-to-Speech Conversion with Enhanced Robustness and Resource Efficiency. International Journal of Intelligent Systems and Applications in Engineering, 12(18s), 813–819. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/5170

Issue

Section

Research Article