Silent Interpreter: Analysis of Lip Movement and Extracting Speech Using Deep Learning

Shwetha K S

Silent Interpreter: Analysis of Lip Movement and Extracting Speech Using Deep Learning

Authors

Shwetha K S, Rohith M K, Sakshi Prashant Yandagoudar, Sinchana, Ameen Hafeez

Keywords:

lip reading, deep learning, convolutional neural networks, recurrent neural networks, CTC loss, speech recognition

Abstract

Lip reading presents a captivating avenue for advancing speech recognition algorithms, leveraging visual cues from lip movements to recognise spoken words. This paper introduces a novel method employing deep neural networks to convert lip motions into
textual representations. The methodology integrates convolutional neural networks for visual feature extraction, recurrent neural networks to capture temporal context, and the Connectionist Temporal Classification loss function for aligning lip features with phonemes.
Additionally, dynamic learning rate scheduling and a unique callback mechanism for training visualization are incorporated into the process. Post-training on a sizeable dataset, the model demonstrates notable convergence, showcasing its ability to discern intricate temporal correlations.
Comprehensive evaluations, combining quantitative metrics and qualitative assessments, validate the model's effectiveness. Visual inspections of lip reading capabilities and standard speech recognition criteria evaluation highlight its performance. The study delves into the impact of various model topologies and hyperparameters on performance, providing valuable insights for future research directions. This research contributes a deep learning framework for accurate and efficient speech recognition, expanding the landscape of lip reading technologies. The findings open paths for further refinement and deployment across diverse domains, including assistive technologies, audio-visual communication systems, and human-computer interaction.

Downloads

Download data is not yet available.

References

A lip reading method based on 3D convolutional vision transformer [Wang, Huijuan, Gangqiang Pu, and Tingyu Chen]

Deshmukh, N., Ahire, A., Bhandari, S. H., Mali, A., & Warkari,K. (2021). "Vision based Lip Reading System using Deep Learning." In 2021 International Conference on Computing, Communication and Green Engineering (CCGE) (pp. 1-6). IEEE. doi: 10.1109/CCGE50943.2021.9776430

Lu, Y., & Li, H. (2019). "Automatic Lip-Reading System Based on Deep Convolutional Neural Network and Attention-Based Long Short-Term Memory." Appl. Sci., 9, 1599. doi: 10.3390/app9081599

Scanlon, P., Reilly, R., & de Chazal, P. (2003). "Visual Feature Analysis for Automatic Speech reading." In International Conference on Audio-Visual Speech Processing.

Kapkar, P. P., & Bharkad, S. D. (2019). "Lip Feature Extraction and Movement Recognition Methods." International Journal of Scientific & Technology Research, 8.

Ozcan, T., & Basturk, A. (2019). "Lip Reading Using Convolutional Neural Networks with and without Pre-Trained Models." Balkan Journal of Electrical and Computer Engineering, 7(2).

Garg, A., Noyola, J., & Bagadia, S. (2016). "Lip reading using CNN and LSTM."

Gutierrez, A., & Robert, Z-A. (2017). "Lip Reading Word Classification." Stanford University.

S. Fenghour, D. Chen, K. Guo and P. Xiao, "Lip Reading Sentences Using Deep Learning With Only Visual Cues," in IEEE Access, vol. 8, pp. 215516-215530, 2020, doi: 10.1109/ACCESS.2020.3040906.

K. Vayadande, T. Adsare, N. Agrawal, T. Dharmik, A. Patil and S. Zod, "LipReadNet: A Deep Learning Approach to Lip Reading," 2023 International Conference on Applied Intelligence and Sustainable Computing (ICAISC), Dharwad, India, 2023, pp. 1-6, doi: 10.1109/ICAISC58445.2023.10200426.

Downloads

Published

07.05.2024

How to Cite

Shwetha K S. (2024). Silent Interpreter: Analysis of Lip Movement and Extracting Speech Using Deep Learning. International Journal of Intelligent Systems and Applications in Engineering, 12(3), 3312–3315. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/5938

Download Citation

Issue

Vol. 12 No. 3 (2024)

Section

Research Article

License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.

IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.

Silent Interpreter: Analysis of Lip Movement and Extracting Speech Using Deep Learning

Authors

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

Announcements

Information for Authors

ijisae

Information

trindex