LRCN-HTP: Leveraging Hybrid Temporal Processing for Enhanced Activity Recognition in Multi-Human Scenarios
Keywords:
Convolutional Neural Networks, Deep Learning, Feature Extraction, Hybrid Long-Term Recurrent Convolutional-Network Temporal Processing, Multi-Human Activity Recognition, Spatial Context, Temporal Convolutional Networks, Temporal Dependency.Abstract
Multi-human activity recognition remains a challenging domain, with significant research focused on utilizing diverse datasets to identify human activities in everyday scenarios accurately. This paper introduces an innovative approach that employs a Hybrid Long-Term Recurrent Convolutional-Network Temporal Processing (LRCN-HTP) model for enhanced multi-human activity recognition. Integrating advanced computing technology and deep neural networks addresses socially relevant challenges, paving the way for applications requiring a nuanced understanding of human interactions. The LRCN-HTP model synergizes the spatial context understanding of Convolutional Neural Networks (CNNs) with the long-term temporal dependency management of Recurrent Neural Networks (RNNs), particularly LSTM networks. By doing so, it offers a comprehensive framework that leverages the strengths of both CNNs for feature extraction and LSTMs for sequential data processing. This hybrid approach ensures that the model captures the fine-grained details and broader patterns of human activity. To enhance the model's performance and mitigate common deep learning challenges, such as the dependency on extensive labeled datasets, the LRCN-HTP architecture integrates dilated convolutions and causal convolutions within the TCNs to extend the receptive field and maintain the sequence's temporal integrity. The robust feature maps generated through convolutional layers undergo a sophisticated learning process involving various activation functions and filters, subsequently integrated with LSTM's sequential processing to form accurate predictions. Our architecture is tailored to address the intricate problems of sequence prediction with spatial inputs effectively. Testing the extensive UCF101 dataset, our proposed LRCN-HTP model achieves an impressive accuracy of 97.22%, outperforming several existing models. The results underscore the model's reliability and superior capability in recognizing various activities, confirming the effectiveness of our integrated approach in human activity recognition.
Downloads
References
Murugan, Pushparaja. "Learning the sequential temporal information with recurrent neural networks." arXiv preprint, 1807.02857, (2018).
jeff Donahue, Lisa Anne Hendricks, Marcus Rohrbach, Subhashini Venugopalan, Sergio Guadarrama, Kate Saenko, and Trevor Darrell, “Long-term Recurrent Convolutional Networks for Visual Recognition and Description,” " in IEEE Transactions on Pattern Analysis anMachine Intelligence, vol. 39, no. 4, pp. 677-691, April 1, 2017.
S. Ji, W. Xu, M. Yang, and K. Yu, “3D convolutional neural networks for human action recognition,” in IEEE Trans. Pattern Anal. Mach. Intell., 2013.
Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei, “Large-scale video classification with convolutional neural networks,” in CVPR, 2014.
J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell, “Long-term recurrent convolutional networks for visual recognition and description,” in CVPR, 2015.
M. Baccouche, F. Mamalet, C. Wolf, C. Garcia, and A. Baskurt. Sequential deep learning for human action recognition. In Human Behavior Understanding. 2011. 2, 4, 5.
S. Slade, L. Zhang, Y. Yu, and C.P. Lim. An evolving ensemble model of multi-stream convolutional neural networks for human action recognition in still images. Neural computing and applications, pages 1–27, 2022.
G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger. Densely connected convolutional networks. arXiv:1608.06993 [cs], January 2018.
H.A. Qazi, U. Jahangir, B.M. Yousuf, and A.Noor. Human action recognition using sift and hog method. In 2017 International conference on information and communication technologies, pages 6– 10, 2017.
S Zabihi, E Rahimian, A Asif, A Mohammadi, “Light-weighted CNN-Attention based architecture for Hand Gesture Recognition via ElectroMyography”, pp:1-5, 2022.
Khodabandelou, G., Kheriji, W. & Selem, F.H. Link traffic speed forecasting using convolutional attention-based gated recurrent unit. Appl Intell 51, 2331–2352 2021.
Kamyab M, Liu G, Rasool A, Adjeisah M. ACR-SA: attention-based deep model through two-channel CNN and Bi-RNN for sentiment analysis. PeerJ Comput Sci.;8:e877, 2022.
Downloads
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.