A Robust LSTM Model for Video Detection and Classification: An Optimization Procedure

Authors

  • Sunitha Sabbu Research Scholar, Department of CSE Koneru Lakshmaiah Education Foundation, Vaddeswaram, Andhra Pradesh, India.
  • Vithya Ganesan Professor,Department of CSE Koneru Lakshmaiah Education Foundation,Vaddeswaram, Andhra Pradesh, India.

Keywords:

Activation Function, Batch Normalization, Deep Temporal Features, Human Activity Recognition, Long-Short Term Memory, Video Detection

Abstract

The act of human activity recognition plays a vital role in a wide range of fields, such as monitoring and healthcare. Researchers are now using various deep learning and machine learning techniques to analyze and interpret the collected data. With deep learning techniques, such as DL, researchers have been able to improve the performance of various HAR systems by extracting high-level features. In this work, a novel research methodology for human abnormal activities like abuse, assault, arson, arrest, and fighting was introduced. The procedure utilizes the VGG16-based LSTM neural network. The proposed methodology combines the features from the LSTM layers to generate a representation which further supports in enhancing the performance accuracy of the classification task. Due to the complexity of human action recognition and video detection, it can be very expensive to train a model. In this paper, our goal was to minimize the training time and improve the accuracy of our work by implementing a low-cost LSTM structure for video detection. The paper presents a LSTM structure that can perform well after the hyperparameter tuning in validating UCF101's entire dataset. The structure, which is called Context-LSTM, can process the deep temporal features. The use of the proposed LSTM structure can reduce the training time, while maintaining the top-rated accuracy and helps to minimize the memory usage of the GPU. The classification model was able to categorize the "fighting," "arson," "abuse," and "arrest," class labels 95.43%, 95.97%, and 97.37% accuracy, respectively. The proposed model has also tested well and achieved an accuracy rate of 95.81%. with a misclassification error rate of 4.19%.

Downloads

Download data is not yet available.

References

Akter, M.; Ansary, S.; Khan, M.A.-M.; Kim, D. Human Activity Recognition Using Attention-Mechanism-Based Deep Learning Feature Combination. Sensors 2023, 23, 5715. https:// doi.org/10.3390/s23125715.

Haodong Duan, Yue Zhao, Yuanjun Xiong, Wentao Liu, and Dahua Lin, "Omni-Sourced Webly-Supervised Learning for Video Recognition," Cham, 2020: Springer International Publishing, in Computer Vision – ECCV 2020, pp. 670- 688.

Xianyuan Wang, Zhenjiang Miao, Ruyi Zhang, and Shanshan Hao, "I3D-LSTM: A New Model for Human Action Recognition," IOP Conference Series: Materials Science and Engineering, vol. 569, no. 3, p. 032035, 2019/07/01 2019, doi: 10.1088/1757-899x/569/3/032035.

Guoxi Huang and Adrian G Bors, "Busy-Quiet Video Disentangling for Video Classification," in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2022, pp. 1341-1350.

Shreyank N Gowda, Marcus Rohrbach, and Laura Sevilla-Lara, "SMART Frame Selection for Action Recognition," in Proceedings of the AAAI Conference on Artificial Intelligence, 2021, vol. 35, no. 2, pp. 1451-1459.

Shervin Manzuri Shalmani, Fei Chiang, and Rong Zheng, "Efficient Action Recognition Using Confidence Distillation," arXiv preprint arXiv:2109.02137, 2021.

Dhammi, L.; Tewari, P. Classification of Human Activities using data captured through a Smartphone using deep learning techniques. In Proceedings of the 2021 3rd International Conference on Signal Processing and Communication (ICPSC), Coimbatore, India, 13–14 May 2021; pp. 689–694.

Jantawong, P.; Jitpattanakul, A.; Mekruksavanich, S. Enhancement of Human Complex Activity Recognition using Wearable Sensors Data with InceptionTime Network. In Proceedings of the 2021 2nd International Conference on Big Data Analytics and Practices (IBDAP), Bangkok, Thailand, 26–27August 2021; pp. 12–16.

Shi, H.; Hou, Z.; Liang, J.; Lin, E.; Zhong, Z. DSFNet: A Distributed Sensors Fusion Network for Action Recognition. IEEE Sens. J. 2023, 23, 839–848.

Teng, Q.; Tang, Y.; Hu, G. RepHAR: Decoupling Networks with Accuracy-Speed Tradeoff for Sensor-Based Human Activity Recognition. IEEE Trans. Instrum. Meas. 2023, 72, 2505111.

Challa, S.K.; Kumar, A.; Semwal, V.B. A multibranch CNN-BiLSTM model for human activity recognition using wearable sensor data. Vis. Comput. 2022, 38, 4095–4109.

Downloads

Published

24.03.2024

How to Cite

Sabbu, S. ., & Ganesan, V. . (2024). A Robust LSTM Model for Video Detection and Classification: An Optimization Procedure. International Journal of Intelligent Systems and Applications in Engineering, 12(19s), 505–511. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/5092

Issue

Section

Research Article