A Robust LSTM Model for Video Detection and Classification: An Optimization Procedure
Keywords:
Activation Function, Batch Normalization, Deep Temporal Features, Human Activity Recognition, Long-Short Term Memory, Video DetectionAbstract
The act of human activity recognition plays a vital role in a wide range of fields, such as monitoring and healthcare. Researchers are now using various deep learning and machine learning techniques to analyze and interpret the collected data. With deep learning techniques, such as DL, researchers have been able to improve the performance of various HAR systems by extracting high-level features. In this work, a novel research methodology for human abnormal activities like abuse, assault, arson, arrest, and fighting was introduced. The procedure utilizes the VGG16-based LSTM neural network. The proposed methodology combines the features from the LSTM layers to generate a representation which further supports in enhancing the performance accuracy of the classification task. Due to the complexity of human action recognition and video detection, it can be very expensive to train a model. In this paper, our goal was to minimize the training time and improve the accuracy of our work by implementing a low-cost LSTM structure for video detection. The paper presents a LSTM structure that can perform well after the hyperparameter tuning in validating UCF101's entire dataset. The structure, which is called Context-LSTM, can process the deep temporal features. The use of the proposed LSTM structure can reduce the training time, while maintaining the top-rated accuracy and helps to minimize the memory usage of the GPU. The classification model was able to categorize the "fighting," "arson," "abuse," and "arrest," class labels 95.43%, 95.97%, and 97.37% accuracy, respectively. The proposed model has also tested well and achieved an accuracy rate of 95.81%. with a misclassification error rate of 4.19%.
Downloads
References
Akter, M.; Ansary, S.; Khan, M.A.-M.; Kim, D. Human Activity Recognition Using Attention-Mechanism-Based Deep Learning Feature Combination. Sensors 2023, 23, 5715. https:// doi.org/10.3390/s23125715.
Haodong Duan, Yue Zhao, Yuanjun Xiong, Wentao Liu, and Dahua Lin, "Omni-Sourced Webly-Supervised Learning for Video Recognition," Cham, 2020: Springer International Publishing, in Computer Vision – ECCV 2020, pp. 670- 688.
Xianyuan Wang, Zhenjiang Miao, Ruyi Zhang, and Shanshan Hao, "I3D-LSTM: A New Model for Human Action Recognition," IOP Conference Series: Materials Science and Engineering, vol. 569, no. 3, p. 032035, 2019/07/01 2019, doi: 10.1088/1757-899x/569/3/032035.
Guoxi Huang and Adrian G Bors, "Busy-Quiet Video Disentangling for Video Classification," in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2022, pp. 1341-1350.
Shreyank N Gowda, Marcus Rohrbach, and Laura Sevilla-Lara, "SMART Frame Selection for Action Recognition," in Proceedings of the AAAI Conference on Artificial Intelligence, 2021, vol. 35, no. 2, pp. 1451-1459.
Shervin Manzuri Shalmani, Fei Chiang, and Rong Zheng, "Efficient Action Recognition Using Confidence Distillation," arXiv preprint arXiv:2109.02137, 2021.
Dhammi, L.; Tewari, P. Classification of Human Activities using data captured through a Smartphone using deep learning techniques. In Proceedings of the 2021 3rd International Conference on Signal Processing and Communication (ICPSC), Coimbatore, India, 13–14 May 2021; pp. 689–694.
Jantawong, P.; Jitpattanakul, A.; Mekruksavanich, S. Enhancement of Human Complex Activity Recognition using Wearable Sensors Data with InceptionTime Network. In Proceedings of the 2021 2nd International Conference on Big Data Analytics and Practices (IBDAP), Bangkok, Thailand, 26–27August 2021; pp. 12–16.
Shi, H.; Hou, Z.; Liang, J.; Lin, E.; Zhong, Z. DSFNet: A Distributed Sensors Fusion Network for Action Recognition. IEEE Sens. J. 2023, 23, 839–848.
Teng, Q.; Tang, Y.; Hu, G. RepHAR: Decoupling Networks with Accuracy-Speed Tradeoff for Sensor-Based Human Activity Recognition. IEEE Trans. Instrum. Meas. 2023, 72, 2505111.
Challa, S.K.; Kumar, A.; Semwal, V.B. A multibranch CNN-BiLSTM model for human activity recognition using wearable sensor data. Vis. Comput. 2022, 38, 4095–4109.
Downloads
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.