A Hybrid CNN–LSTM Model for Emotion Prediction from Visual Data in Children

Authors

  • Vikas Jangra, Sumeet Gill, Binny Sharma, Archna Kirar

Keywords:

Multimodal Emotion Recognition, Facial Expression Analysis, Gesture Recognition, Convolutional Neural Networks (CNN), Long Short-Term Memory (LSTM)

Abstract

Emotion recognition is one of the crucial components of human-computer interaction that allows systems to respond intelligently to user emotions. In this paper, a hybrid framework for emotion recognition based on facial and gesture-based expression models is proposed. This framework utilizes a deep convolutional neural network for facial expression analysis and an LSTM for gesture recognition, thus providing state-of-the-art performance. A multimodal fusion technique combines the features of both modalities to boost the accuracy in emotion classification. Extensive evaluation was performed on benchmark datasets with an impressive accuracy of 87.5% to predict correct emotions across six basic emotions, a neutral state, valence, and nine complex emotions. Emotion recognition in children aged between 4 and 14 years is challenging since their emotional expressions are subtle and are in a developmental stage, with very few published works in this area. The novelty of this paper is a multimodal emotion recognition model for children, incorporating facial expressions and gestures, novel architecture, and multimodal input fusion, offering a richer and more context-aware emotion recognition framework, especially beneficial in dynamic child interaction environments. This work advances emotion recognition systems for younger populations with applications in education, child development, and healthcare.

Downloads

Download data is not yet available.

References

Alhussein, M. (2016). Automatic facial emotion recognition using the weber local descriptor for the e-Healthcare system. Cluster Computing, 19, 99-108. https://doi.org/10.1007/s10586-016-0535-3

Alzubaidi, L., Zhang, J., Humaidi, A. J., Al-Dujaili, A., Duan, Y., Al-Shamma, O., ... & Farhan, L. (2021). Review of deep learning: concepts, CNN architectures, challenges, applications, and future directions. Journal of Big Data, 8, 1-74. https://doi.org/10.1186/s40537-021-00444-8

Barros, P., Jirak, D., Weber, C., & Wermter, S. (2015). Multimodal emotional state recognition using sequence-dependent deep hierarchical features. Neural Networks, 72, 140-151. https://doi.org/10.1016/j.neunet.2015.09.009

Bartlett, M. S., Littlewort, G., Lainscsek, C., Fasel, I., & Movellan, J. (2004, October). Machine learning methods for fully automatic recognition of facial expressions and facial actions. In the 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No. 04CH37583) (Vol. 1, pp. 592-597). IEEE. DOI: 10.1109/ICSMC.2004.1398364

Chen, S., Tian, Y., Liu, Q., & Metaxas, D. N. (2013). Recognizing expressions from face and body gestures by temporal normalized motion and appearance features. Image and Vision Computing, 31(2), 175-185. https://doi.org/10.1016/j.imavis.2012.06.014

Gunes, H., & Piccardi, M. (2005, August). Fusing face and body gestures for machine recognition of emotions. In ROMAN 2005. IEEE International Workshop on Robot and Human Interactive Communication, 2005. (pp. 306-311). IEEE. DOI: 10.1109/ROMAN.2005.1513796

Gunes, H., & Piccardi, M. (2007). Bi-modal emotion recognition from expressive face and body gestures. Journal of Network and Computer Applications, 30(4), 1334-1345. https://doi.org/10.1016/j.jnca.2006.09.007

Gunes, H., & Piccardi, M. (2008). Automatic temporal segment detection and affect recognition from face and body display. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 39(1), 64-84. DOI: 10.1109/TSMCB.2008.927269

Hudlicka, E. (2003). To feel or not to feel: The role of affect in human–computer interaction. International journal of human-computer studies, 59(1-2), 1-32. https://doi.org/10.1016/S1071-5819(03)00047-8

Ilyas, C. M. A., Nunes, R., Nasrollahi, K., Rehm, M., & Moeslund, T. B. (2021, February). Deep Emotion Recognition through Upper Body Movements and Facial Expression. In VISIGRAPP (5: VISAPP) (pp. 669-679). DOI: 10.5220/0010359506690679

Karatay, B., Bestepe, D., Sailunaz, K., Ozyer, T., & Alhajj, R. (2022, March). A multi-modal emotion recognition system based on the CNN-transformer deep learning technique. In 2022 7th International Conference on Data Science and Machine Learning Applications (CDMA) (pp. 145-150). IEEE. DOI: 10.1109/CDMA54072.2022.00029

Keshari, T., & Palaniswamy, S. (2019, July). Emotion recognition using feature-level fusion of facial expressions and body gestures. In 2019 International conference on Communication and Electronics Systems (ICCES) (pp. 1184-1189). IEEE. DOI: 10.1109/ICCES45898.2019.9002175

Ko, B. C. (2018). A brief review of facial emotion recognition based on visual information. sensors, 18(2), 401. https://doi.org/10.3390/s18020401

Llurba, C., & Palau, R. (2024). Real-Time Emotion Recognition for Improving the Teaching–Learning Process: A Scoping Review. Journal of Imaging, 10(12), 313. https://doi.org/10.3390/jimaging10120313

Mathew, A. R., Al Hajj, A., & Al Abri, A. (2011, June). Human-computer interaction (HCI): An overview. In 2011 IEEE International Conference on Computer Science and Automation Engineering (Vol. 1, pp. 99-100). IEEE. DOI: 10.1109/CSAE.2011.5953178

Newen, A., Welpinghus, A., & Juckel, G. (2015). Emotion recognition as pattern recognition: the relevance of perception. Mind & Language, 30(2), 187-208. https://doi.org/10.1111/mila.12077

Nojavanasghari, B., Baltrušaitis, T., Hughes, C. E., & Morency, L. P. (2016, October). Emoreact: a multimodal approach and dataset for recognizing emotional responses in children. In Proceedings of the 18th ACM International Conference on Multimodal Interaction (pp. 137-144). https://doi.org/10.1145/2993148.2993168

Nunes, A. R. V. (2019). Deep emotion recognition through upper body movements and facial expression (Doctoral dissertation, Master’s Thesis, Aalborg University). https://projekter.aau.dk/projekter/files/307194482/Thesis_Rita_Nunes.pdf

Rathod, M., Dalvi, C., Kaur, K., Patil, S., Gite, S., Kamat, P., ... & Gabralla, L. A. (2022). Kids’ emotion recognition using various deep-learning models with explainable AI. Sensors, 22(20), 8066. https://doi.org/10.3390/s22208066

Verma, B., & Choudhary, A. (2021). Affective state recognition from hand gestures and facial expressions using Grassmann manifolds. Multimedia Tools and Applications, 80(9), 14019-14040. https://doi.org/10.1007/s11042-020-10341-6

Wei, J., Hu, G., Yang, X., Luu, A. T., & Dong, Y. (2024). Learning facial expression and body gesture visual information for video emotion recognition. Expert Systems with Applications, 237, 121419. https://doi.org/10.1016/j.eswa.2023.121419

Yang, R., Singh, S. K., Tavakkoli, M., Amiri, N., Yang, Y., Karami, M. A., & Rai, R. (2020). CNN-LSTM deep learning architecture for computer vision-based modal frequency detection. Mechanical Systems and Signal Processing, 144, 106885. https://doi.org/10.1016/j.ymssp.2020.106885

Lopez-Rincon, A. (2019, February). Emotion recognition using facial expressions in children using the NAO Robot. In 2019 international conference on electronics, communications, and computers (CONIELECOMP) (pp. 146-153). IEEE, DOI: 10.1109/CONIELECOMP.2019.8673111

Filntisis, P. P., Efthymiou, N., Koutras, P., Potamianos, G., & Maragos, P. (2019). Fusing body posture with facial expressions for joint recognition of affect in child–robot interaction. IEEE Robotics and Automation letters, 4(4), 4011-4018, DOI: 10.1109/LRA.2019.2930434

Filntisis, P. P., Efthymiou, N., Potamianos, G., & Maragos, P. (2021, August). An Audiovisual Child Emotion Recognition System for Child-Robot Interaction Applications. In 2021 29th European Signal Processing Conference (EUSIPCO) (pp. 791-795). IEEE, DOI: 10.23919/EUSIPCO54536.2021.9616106

Suhan, S., Kalaichelvan, K., Samarage, L., Alahakoon, D., Samarasinghe, P., & Nadeeshani, M. (2022, November). Automated Evaluation of Child Emotion Expression and Recognition Abilities. In 2022 International Conference on Information Technology Systems and Innovation (ICITSI) (pp. 388-393). IEEE, DOI: 10.1109/ICITSI56531.2022.9970990

Rathod, M., Dalvi, C., Kaur, K., Patil, S., Gite, S., Kamat, P., ... & Gabralla, L. A. (2022). Kids’ emotion recognition using various deep-learning models with explainable AI. Sensors, 22(20), 8066, https://doi.org/10.3390/s22208066

Pandyan, U. M., Sindha, M. M. R., Kannapiran, P., Marimuthu, S., & Anbunathan, V. (2023). Application of Machine and Deep Learning Techniques to Facial Emotion Recognition in Infants. In Emotion Recognition-Recent Advances, New Perspectives, and Applications, https://www.intechopen.com/chapters/85877

Downloads

Published

11.12.2024

How to Cite

Vikas Jangra. (2024). A Hybrid CNN–LSTM Model for Emotion Prediction from Visual Data in Children. International Journal of Intelligent Systems and Applications in Engineering, 12(23s), 3435–3444. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/7750

Issue

Section

Research Article