AI-Based Surveillance Framework for Physical Violence Detection

Authors

  • Srividya M. S., Anala M. R.

Keywords:

Deep Learning, Human Action Recognition, 3D Convolutional Neural Networks, Long Short-Term Memory (LSTM), Physical Abuse Detection, Transfer Learning, Video Surveillance Analysis, Performance Metrics in Machine Learning

Abstract

This research paper presents a groundbreaking approach to addressing the societal problem of physical abuse, which affects various demographic groups, including children, women, and older people, especially in domestic and workplace environments. The complexity of these situations, especially when the abuser and victim know each other, highlights the need for an advanced solution. The paper introduces a novel hybrid deep-learning framework to detect and prevent physical abuse and address this. The framework utilizes human action recognition, leveraging a 3D convolutional neural network (CNN) to meticulously analyze human actions in such contexts. The deep learning model is further enhanced by employing transfer learning techniques with ResNet-18 and GoogleNet models. These models are trained using the UBI fight and UCF crime datasets, which are public resources for video analysis, to identify instances of physical abuse. A significant innovation in this model is the transformation of 2D kernels into 3D kernels, which allows for an improved extraction of features in both temporal and spatial dimensions from the video data. Additionally, a bilinear Long Short-Term Memory (LSTM) layer is integrated into the model to capture more extended material information, thus improving the analysis of human actions. The results of this hybrid model in detecting physical abuse are promising, showing marked improvements in performance metrics due to the shift from 2D to 3D kernels and the inclusion of bilinear LSTM

Downloads

Download data is not yet available.

References

Y. Cao et al., “Recognize Human Activities from Partially Observed Videos,” in 2013 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Jun. 2013, pp. 2658–2665. doi: 10.1109/CVPR.2013.343.

C. Nolker and H. Ritter, “Visual recognition of continuous hand postures,” IEEE Trans Neural Netw, vol. 13, no. 4, pp. 983–994, Jul. 2002, doi: 10.1109/TNN.2002.1021898.

E. Ueda, Y. Matsumoto, M. Imai, and T. Ogasawara, “A hand-pose estimation for vision-based human interfaces,” IEEE Transactions on Industrial Electronics, vol. 50, no. 4, pp. 676–684, Aug. 2003, doi: 10.1109/TIE.2003.814758.

S. Mitra and T. Acharya, “Gesture Recognition: A Survey,” IEEE Transactions on Systems, Man and Cybernetics, Part C (Applications and Reviews), vol. 37, no. 3, pp. 311–324, May 2007, doi: 10.1109/TSMCC.2007.893280.

A. Mumtaz, A. B. Sargano, and Z. Habib, “Robust learning for real-world anomalies in surveillance videos,” Multimed Tools Appl, vol. 82, no. 13, pp. 20303–20322, May 2023, doi: 10.1007/s11042-023-14425-x.

Yanmin Zhu, Zhibo Yang, and Bo Yuan, “Vision-Based Hand Gesture Recognition,” in 2013 International Conference on Service Sciences (ICSS), IEEE, Apr. 2013, pp. 260–265. doi: 10.1109/ICSS.2013.40.

S. Waheed, R. Amin, J. Iqbal, M. Hussain, and M. A. Bashir, “An Automated Human Action Recognition and Classification Framework Using Deep Learning,” in 2023 4th International Conference on Computing, Mathematics and Engineering Technologies (iCoMET), IEEE, Mar. 2023, pp. 1–5. doi: 10.1109/iCoMET57998.2023.10099190.

Z. Sun, Q. Ke, H. Rahmani, M. Bennamoun, G. Wang, and J. Liu, “Human Action Recognition From Various Data Modalities: A Review,” IEEE Trans Pattern Anal Mach Intell, pp. 1–20, 2022, doi: 10.1109/TPAMI.2022.3183112.

D. Liang and E. Thomaz, “Audio-Based Activities of Daily Living (ADL) Recognition with Large-Scale Acoustic Embeddings from Online Videos,” Proc ACM Interact Mob Wearable Ubiquitous Technol, vol. 3, no. 1, pp. 1–18, Mar. 2019, doi: 10.1145/3314404.

D. Ganesh, R. R. Teja, C. D. Reddy, and D. Swathi, “Human Action Recognition based on Depth maps, Skeleton and Sensor Images using Deep Learning,” in 2022 IEEE 3rd Global Conference for Advancement in Technology (GCAT), IEEE, Oct. 2022, pp. 1–8. doi: 10.1109/GCAT55367.2022.9971982.

M. DALLEL, V. HAVARD, D. BAUDRY, and X. SAVATIER, “InHARD - Industrial Human Action Recognition Dataset in the Context of Industrial Collaborative Robotics,” in 2020 IEEE International Conference on Human-Machine Systems (ICHMS), IEEE, Sep. 2020, pp. 1–6. doi: 10.1109/ICHMS49158.2020.9209531.

J.-S. Kim, “Efficient Human Action Recognition with Dual-Action Neural Networks for Virtual Sports Training,” in 2022 IEEE International Conference on Consumer Electronics-Asia (ICCE-Asia), IEEE, Oct. 2022, pp. 1–3. doi: 10.1109/ICCE-Asia57006.2022.9954758.

P. Le Noury, R. Polman, M. Maloney, and A. Gorman, “A Narrative Review of the Current State of Extended Reality Technology and How it can be Utilised in Sport,” Sports Medicine, vol. 52, no. 7, pp. 1473–1489, Jul. 2022, doi: 10.1007/s40279-022-01669-0.

N. Jaouedi, N. Boujnah, and M. S. Bouhlel, “A new hybrid deep learning model for human action recognition,” Journal of King Saud University - Computer and Information Sciences, vol. 32, no. 4, pp. 447–453, May 2020, doi: 10.1016/j.jksuci.2019.09.004.

H.-H. Pham, L. Khoudour, A. Crouzil, P. Zegers, and S. A. Velastin, “Exploiting deep residual networks for human action recognition from skeletal data,” Computer Vision and Image Understanding, vol. 170, pp. 51–66, May 2018, doi: 10.1016/j.cviu.2018.03.003.

Z. Cao, T. Simon, S.-E. Wei, and Y. Sheikh, “Realtime Multi-person 2D Pose Estimation Using Part Affinity Fields,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Jul. 2017, pp. 1302–1310. doi: 10.1109/CVPR.2017.143.

K. Yun, J. Honorio, D. Chattopadhyay, T. L. Berg, and D. Samaras, “Two-person interaction detection using body-pose features and multiple instance learning,” in 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, IEEE, Jun. 2012, pp. 28–35. doi: 10.1109/CVPRW.2012.6239234.

R. A. Guler, N. Neverova, and I. Kokkinos, “DensePose: Dense Human Pose Estimation in the Wild,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Jun. 2018, pp. 7297–7306. doi: 10.1109/CVPR.2018.00762.

Ho Yub Jung, Soochahn Lee, Yong Seok Heo, and Il Dong Yun, “Random tree walk toward instantaneous 3D human pose estimation,” in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Jun. 2015, pp. 2467–2474. doi: 10.1109/CVPR.2015.7298861.

H. C. Altunay and Z. Albayrak, “A hybrid CNN+LSTM-based intrusion detection system for industrial IoT networks,” Engineering Science and Technology, an International Journal, vol. 38, p. 101322, Feb. 2023, doi: 10.1016/j.jestch.2022.101322.

A. Raza et al., “A Hybrid Deep Learning-Based Approach for Brain Tumor Classification,” Electronics (Basel), vol. 11, no. 7, p. 1146, Apr. 2022, doi: 10.3390/electronics11071146.

R. Tandon, S. Agrawal, A. Chang, and S. S. Band, “VCNet: Hybrid Deep Learning Model for Detection and Classification of Lung Carcinoma Using Chest Radiographs,” Front Public Health, vol. 10, Jun. 2022, doi: 10.3389/fpubh.2022.894920.

V. Hnamte, H. Nhung-Nguyen, J. Hussain, and Y. Hwa-Kim, “A Novel Two-Stage Deep Learning Model for Network Intrusion Detection: LSTM-AE,” IEEE Access, vol. 11, pp. 37131–37148, 2023, doi: 10.1109/ACCESS.2023.3266979.

Mst. A. Khatun et al., “Deep CNN-LSTM With Self-Attention Model for Human Activity Recognition Using Wearable Sensor,” IEEE J Transl Eng Health Med, vol. 10, pp. 1–16, 2022, doi: 10.1109/JTEHM.2022.3177710.

B. Lindemann, B. Maschler, N. Sahlab, and M. Weyrich, “A survey on anomaly detection for technical systems using LSTM networks,” Comput Ind, vol. 131, p. 103498, Oct. 2021, doi: 10.1016/j.compind.2021.103498.

A. S. Musleh, G. Chen, Z. Y. Dong, C. Wang, and S. Chen, “Attack Detection in Automatic Generation Control Systems using LSTM-Based Stacked Autoencoders,” IEEE Trans Industr Inform, vol. 19, no. 1, pp. 153–165, Jan. 2023, doi: 10.1109/TII.2022.3178418.

E. Mushtaq, A. Zameer, M. Umer, and A. A. Abbasi, “A two-stage intrusion detection system with auto-encoder and LSTMs,” Appl Soft Comput, vol. 121, p. 108768, May 2022, doi: 10.1016/j.asoc.2022.108768.

M. Mahmoud, M. Kasem, A. Abdallah, and H. S. Kang, “AE-LSTM: Autoencoder with LSTM-Based Intrusion Detection in IoT,” in 2022 International Telecommunications Conference (ITC-Egypt), IEEE, Jul. 2022, pp. 1–6. doi: 10.1109/ITC-Egypt55520.2022.9855688.

Mst. A. Khatun et al., “Deep CNN-LSTM With Self-Attention Model for Human Activity Recognition Using Wearable Sensor,” IEEE J Transl Eng Health Med, vol. 10, pp. 1–16, 2022, doi: 10.1109/JTEHM.2022.3177710.

C. Vondrick, H. Pirsiavash, and A. Torralba, “Anticipating Visual Representations from Unlabeled Video,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Jun. 2016, pp. 98–106. doi: 10.1109/CVPR.2016.18.

M. Ziaeefard, R. Bergevin, and L.-P. Morency, “Time-slice Prediction of Dyadic Human Activities,” in Proceedings of the British Machine Vision Conference 2015, British Machine Vision Association, 2015, pp. 167.1-167.13. doi: 10.5244/C.29.167.

M. S. Ryoo, “Human activity prediction: Early recognition of ongoing activities from streaming videos,” in 2011 International Conference on Computer Vision, IEEE, Nov. 2011, pp. 1036–1043. doi: 10.1109/ICCV.2011.6126349.

F. J. Rendón-Segador, J. A. Álvarez-García, J. L. Salazar-González, and T. Tommasi, “CrimeNet: Neural Structured Learning using Vision Transformer for violence detection,” Neural Networks, vol. 161, pp. 318–329, Apr. 2023, doi: 10.1016/j.neunet.2023.01.048.

M. A. B. Abbass and H.-S. Kang, “Violence Detection Enhancement by Involving Convolutional Block Attention Modules Into Various Deep Learning Architectures: Comprehensive Case Study for UBI-Fights Dataset,” IEEE Access, vol. 11, pp. 37096–37107, 2023, doi: 10.1109/ACCESS.2023.3267409.

C. Leng, Q. Ding, C. Wu, and A. Chen, “Augmented two-stream network for robust action recognition adaptive to various action videos,” J Vis Commun Image Represent, vol. 81, p. 103344, Nov. 2021, doi: 10.1016/j.jvcir.2021.103344.

M. Z. Zaheer, A. Mahmood, H. Shin, and S.-I. Lee, “A Self-Reasoning Framework for Anomaly Detection Using Video-Level Labels,” IEEE Signal Process Lett, vol. 27, pp. 1705–1709, 2020, doi: 10.1109/LSP.2020.3025688.

Downloads

Published

01.04.2024

How to Cite

Anala M. R., S. M. S. (2024). AI-Based Surveillance Framework for Physical Violence Detection. International Journal of Intelligent Systems and Applications in Engineering, 12(3), 1470–1481. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/5540

Issue

Section

Research Article