Silent Interpreter: Analysis of Lip Movement and Extracting Speech Using Deep Learning

Authors

  • Shwetha K S, Rohith M K, Sakshi Prashant Yandagoudar, Sinchana, Ameen Hafeez

Keywords:

lip reading, deep learning, convolutional neural networks, recurrent neural networks, CTC loss, speech recognition

Abstract

Lip reading presents a captivating avenue for advancing speech recognition algorithms, leveraging visual cues from lip movements to recognise spoken words. This paper introduces a novel method employing deep neural networks to convert lip motions into
textual representations. The methodology integrates convolutional neural networks for visual feature extraction, recurrent neural networks to capture temporal context, and the Connectionist Temporal Classification loss function for aligning lip features with phonemes.
Additionally, dynamic learning rate scheduling and a unique callback mechanism for training visualization are incorporated into the process. Post-training on a sizeable dataset, the model demonstrates notable convergence, showcasing its ability to discern intricate temporal correlations.
Comprehensive evaluations, combining quantitative metrics and qualitative assessments, validate the model's effectiveness. Visual inspections of lip reading capabilities and standard speech recognition criteria evaluation highlight its performance. The study delves into the impact of various model topologies and hyperparameters on performance, providing valuable insights for future research directions. This research contributes a deep learning framework for accurate and efficient speech recognition, expanding the landscape of lip reading technologies. The findings open paths for further refinement and deployment across diverse domains, including assistive technologies, audio-visual communication systems, and human-computer interaction.

Downloads

Download data is not yet available.

References

A lip reading method based on 3D convolutional vision transformer [Wang, Huijuan, Gangqiang Pu, and Tingyu Chen]

Deshmukh, N., Ahire, A., Bhandari, S. H., Mali, A., & Warkari,K. (2021). "Vision based Lip Reading System using Deep Learning." In 2021 International Conference on Computing, Communication and Green Engineering (CCGE) (pp. 1-6). IEEE. doi: 10.1109/CCGE50943.2021.9776430

Lu, Y., & Li, H. (2019). "Automatic Lip-Reading System Based on Deep Convolutional Neural Network and Attention-Based Long Short-Term Memory." Appl. Sci., 9, 1599. doi: 10.3390/app9081599

Scanlon, P., Reilly, R., & de Chazal, P. (2003). "Visual Feature Analysis for Automatic Speech reading." In International Conference on Audio-Visual Speech Processing.

Kapkar, P. P., & Bharkad, S. D. (2019). "Lip Feature Extraction and Movement Recognition Methods." International Journal of Scientific & Technology Research, 8.

Ozcan, T., & Basturk, A. (2019). "Lip Reading Using Convolutional Neural Networks with and without Pre-Trained Models." Balkan Journal of Electrical and Computer Engineering, 7(2).

Garg, A., Noyola, J., & Bagadia, S. (2016). "Lip reading using CNN and LSTM."

Gutierrez, A., & Robert, Z-A. (2017). "Lip Reading Word Classification." Stanford University.

S. Fenghour, D. Chen, K. Guo and P. Xiao, "Lip Reading Sentences Using Deep Learning With Only Visual Cues," in IEEE Access, vol. 8, pp. 215516-215530, 2020, doi: 10.1109/ACCESS.2020.3040906.

K. Vayadande, T. Adsare, N. Agrawal, T. Dharmik, A. Patil and S. Zod, "LipReadNet: A Deep Learning Approach to Lip Reading," 2023 International Conference on Applied Intelligence and Sustainable Computing (ICAISC), Dharwad, India, 2023, pp. 1-6, doi: 10.1109/ICAISC58445.2023.10200426.

Downloads

Published

07.05.2024

How to Cite

Shwetha K S. (2024). Silent Interpreter: Analysis of Lip Movement and Extracting Speech Using Deep Learning. International Journal of Intelligent Systems and Applications in Engineering, 12(3), 3312–3315. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/5938

Issue

Section

Research Article