A Hybrid CNN–LSTM Model for Emotion Prediction from Visual Data in Children
Keywords:
Multimodal Emotion Recognition, Facial Expression Analysis, Gesture Recognition, Convolutional Neural Networks (CNN), Long Short-Term Memory (LSTM)Abstract
Emotion recognition is one of the crucial components of human-computer interaction that allows systems to respond intelligently to user emotions. In this paper, a hybrid framework for emotion recognition based on facial and gesture-based expression models is proposed. This framework utilizes a deep convolutional neural network for facial expression analysis and an LSTM for gesture recognition, thus providing state-of-the-art performance. A multimodal fusion technique combines the features of both modalities to boost the accuracy in emotion classification. Extensive evaluation was performed on benchmark datasets with an impressive accuracy of 87.5% to predict correct emotions across six basic emotions, a neutral state, valence, and nine complex emotions. Emotion recognition in children aged between 4 and 14 years is challenging since their emotional expressions are subtle and are in a developmental stage, with very few published works in this area. The novelty of this paper is a multimodal emotion recognition model for children, incorporating facial expressions and gestures, novel architecture, and multimodal input fusion, offering a richer and more context-aware emotion recognition framework, especially beneficial in dynamic child interaction environments. This work advances emotion recognition systems for younger populations with applications in education, child development, and healthcare.
Downloads
References
Alhussein, M. (2016). Automatic facial emotion recognition using the weber local descriptor for the e-Healthcare system. Cluster Computing, 19, 99-108. https://doi.org/10.1007/s10586-016-0535-3
Alzubaidi, L., Zhang, J., Humaidi, A. J., Al-Dujaili, A., Duan, Y., Al-Shamma, O., ... & Farhan, L. (2021). Review of deep learning: concepts, CNN architectures, challenges, applications, and future directions. Journal of Big Data, 8, 1-74. https://doi.org/10.1186/s40537-021-00444-8
Barros, P., Jirak, D., Weber, C., & Wermter, S. (2015). Multimodal emotional state recognition using sequence-dependent deep hierarchical features. Neural Networks, 72, 140-151. https://doi.org/10.1016/j.neunet.2015.09.009
Bartlett, M. S., Littlewort, G., Lainscsek, C., Fasel, I., & Movellan, J. (2004, October). Machine learning methods for fully automatic recognition of facial expressions and facial actions. In the 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No. 04CH37583) (Vol. 1, pp. 592-597). IEEE. DOI: 10.1109/ICSMC.2004.1398364
Chen, S., Tian, Y., Liu, Q., & Metaxas, D. N. (2013). Recognizing expressions from face and body gestures by temporal normalized motion and appearance features. Image and Vision Computing, 31(2), 175-185. https://doi.org/10.1016/j.imavis.2012.06.014
Gunes, H., & Piccardi, M. (2005, August). Fusing face and body gestures for machine recognition of emotions. In ROMAN 2005. IEEE International Workshop on Robot and Human Interactive Communication, 2005. (pp. 306-311). IEEE. DOI: 10.1109/ROMAN.2005.1513796
Gunes, H., & Piccardi, M. (2007). Bi-modal emotion recognition from expressive face and body gestures. Journal of Network and Computer Applications, 30(4), 1334-1345. https://doi.org/10.1016/j.jnca.2006.09.007
Gunes, H., & Piccardi, M. (2008). Automatic temporal segment detection and affect recognition from face and body display. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 39(1), 64-84. DOI: 10.1109/TSMCB.2008.927269
Hudlicka, E. (2003). To feel or not to feel: The role of affect in human–computer interaction. International journal of human-computer studies, 59(1-2), 1-32. https://doi.org/10.1016/S1071-5819(03)00047-8
Ilyas, C. M. A., Nunes, R., Nasrollahi, K., Rehm, M., & Moeslund, T. B. (2021, February). Deep Emotion Recognition through Upper Body Movements and Facial Expression. In VISIGRAPP (5: VISAPP) (pp. 669-679). DOI: 10.5220/0010359506690679
Karatay, B., Bestepe, D., Sailunaz, K., Ozyer, T., & Alhajj, R. (2022, March). A multi-modal emotion recognition system based on the CNN-transformer deep learning technique. In 2022 7th International Conference on Data Science and Machine Learning Applications (CDMA) (pp. 145-150). IEEE. DOI: 10.1109/CDMA54072.2022.00029
Keshari, T., & Palaniswamy, S. (2019, July). Emotion recognition using feature-level fusion of facial expressions and body gestures. In 2019 International conference on Communication and Electronics Systems (ICCES) (pp. 1184-1189). IEEE. DOI: 10.1109/ICCES45898.2019.9002175
Ko, B. C. (2018). A brief review of facial emotion recognition based on visual information. sensors, 18(2), 401. https://doi.org/10.3390/s18020401
Llurba, C., & Palau, R. (2024). Real-Time Emotion Recognition for Improving the Teaching–Learning Process: A Scoping Review. Journal of Imaging, 10(12), 313. https://doi.org/10.3390/jimaging10120313
Mathew, A. R., Al Hajj, A., & Al Abri, A. (2011, June). Human-computer interaction (HCI): An overview. In 2011 IEEE International Conference on Computer Science and Automation Engineering (Vol. 1, pp. 99-100). IEEE. DOI: 10.1109/CSAE.2011.5953178
Newen, A., Welpinghus, A., & Juckel, G. (2015). Emotion recognition as pattern recognition: the relevance of perception. Mind & Language, 30(2), 187-208. https://doi.org/10.1111/mila.12077
Nojavanasghari, B., Baltrušaitis, T., Hughes, C. E., & Morency, L. P. (2016, October). Emoreact: a multimodal approach and dataset for recognizing emotional responses in children. In Proceedings of the 18th ACM International Conference on Multimodal Interaction (pp. 137-144). https://doi.org/10.1145/2993148.2993168
Nunes, A. R. V. (2019). Deep emotion recognition through upper body movements and facial expression (Doctoral dissertation, Master’s Thesis, Aalborg University). https://projekter.aau.dk/projekter/files/307194482/Thesis_Rita_Nunes.pdf
Rathod, M., Dalvi, C., Kaur, K., Patil, S., Gite, S., Kamat, P., ... & Gabralla, L. A. (2022). Kids’ emotion recognition using various deep-learning models with explainable AI. Sensors, 22(20), 8066. https://doi.org/10.3390/s22208066
Verma, B., & Choudhary, A. (2021). Affective state recognition from hand gestures and facial expressions using Grassmann manifolds. Multimedia Tools and Applications, 80(9), 14019-14040. https://doi.org/10.1007/s11042-020-10341-6
Wei, J., Hu, G., Yang, X., Luu, A. T., & Dong, Y. (2024). Learning facial expression and body gesture visual information for video emotion recognition. Expert Systems with Applications, 237, 121419. https://doi.org/10.1016/j.eswa.2023.121419
Yang, R., Singh, S. K., Tavakkoli, M., Amiri, N., Yang, Y., Karami, M. A., & Rai, R. (2020). CNN-LSTM deep learning architecture for computer vision-based modal frequency detection. Mechanical Systems and Signal Processing, 144, 106885. https://doi.org/10.1016/j.ymssp.2020.106885
Lopez-Rincon, A. (2019, February). Emotion recognition using facial expressions in children using the NAO Robot. In 2019 international conference on electronics, communications, and computers (CONIELECOMP) (pp. 146-153). IEEE, DOI: 10.1109/CONIELECOMP.2019.8673111
Filntisis, P. P., Efthymiou, N., Koutras, P., Potamianos, G., & Maragos, P. (2019). Fusing body posture with facial expressions for joint recognition of affect in child–robot interaction. IEEE Robotics and Automation letters, 4(4), 4011-4018, DOI: 10.1109/LRA.2019.2930434
Filntisis, P. P., Efthymiou, N., Potamianos, G., & Maragos, P. (2021, August). An Audiovisual Child Emotion Recognition System for Child-Robot Interaction Applications. In 2021 29th European Signal Processing Conference (EUSIPCO) (pp. 791-795). IEEE, DOI: 10.23919/EUSIPCO54536.2021.9616106
Suhan, S., Kalaichelvan, K., Samarage, L., Alahakoon, D., Samarasinghe, P., & Nadeeshani, M. (2022, November). Automated Evaluation of Child Emotion Expression and Recognition Abilities. In 2022 International Conference on Information Technology Systems and Innovation (ICITSI) (pp. 388-393). IEEE, DOI: 10.1109/ICITSI56531.2022.9970990
Rathod, M., Dalvi, C., Kaur, K., Patil, S., Gite, S., Kamat, P., ... & Gabralla, L. A. (2022). Kids’ emotion recognition using various deep-learning models with explainable AI. Sensors, 22(20), 8066, https://doi.org/10.3390/s22208066
Pandyan, U. M., Sindha, M. M. R., Kannapiran, P., Marimuthu, S., & Anbunathan, V. (2023). Application of Machine and Deep Learning Techniques to Facial Emotion Recognition in Infants. In Emotion Recognition-Recent Advances, New Perspectives, and Applications, https://www.intechopen.com/chapters/85877
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.