AI-Based Surveillance Framework for Physical Violence Detection
Keywords:
Deep Learning, Human Action Recognition, 3D Convolutional Neural Networks, Long Short-Term Memory (LSTM), Physical Abuse Detection, Transfer Learning, Video Surveillance Analysis, Performance Metrics in Machine LearningAbstract
This research paper presents a groundbreaking approach to addressing the societal problem of physical abuse, which affects various demographic groups, including children, women, and older people, especially in domestic and workplace environments. The complexity of these situations, especially when the abuser and victim know each other, highlights the need for an advanced solution. The paper introduces a novel hybrid deep-learning framework to detect and prevent physical abuse and address this. The framework utilizes human action recognition, leveraging a 3D convolutional neural network (CNN) to meticulously analyze human actions in such contexts. The deep learning model is further enhanced by employing transfer learning techniques with ResNet-18 and GoogleNet models. These models are trained using the UBI fight and UCF crime datasets, which are public resources for video analysis, to identify instances of physical abuse. A significant innovation in this model is the transformation of 2D kernels into 3D kernels, which allows for an improved extraction of features in both temporal and spatial dimensions from the video data. Additionally, a bilinear Long Short-Term Memory (LSTM) layer is integrated into the model to capture more extended material information, thus improving the analysis of human actions. The results of this hybrid model in detecting physical abuse are promising, showing marked improvements in performance metrics due to the shift from 2D to 3D kernels and the inclusion of bilinear LSTM
Downloads
References
Y. Cao et al., “Recognize Human Activities from Partially Observed Videos,” in 2013 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Jun. 2013, pp. 2658–2665. doi: 10.1109/CVPR.2013.343.
C. Nolker and H. Ritter, “Visual recognition of continuous hand postures,” IEEE Trans Neural Netw, vol. 13, no. 4, pp. 983–994, Jul. 2002, doi: 10.1109/TNN.2002.1021898.
E. Ueda, Y. Matsumoto, M. Imai, and T. Ogasawara, “A hand-pose estimation for vision-based human interfaces,” IEEE Transactions on Industrial Electronics, vol. 50, no. 4, pp. 676–684, Aug. 2003, doi: 10.1109/TIE.2003.814758.
S. Mitra and T. Acharya, “Gesture Recognition: A Survey,” IEEE Transactions on Systems, Man and Cybernetics, Part C (Applications and Reviews), vol. 37, no. 3, pp. 311–324, May 2007, doi: 10.1109/TSMCC.2007.893280.
A. Mumtaz, A. B. Sargano, and Z. Habib, “Robust learning for real-world anomalies in surveillance videos,” Multimed Tools Appl, vol. 82, no. 13, pp. 20303–20322, May 2023, doi: 10.1007/s11042-023-14425-x.
Yanmin Zhu, Zhibo Yang, and Bo Yuan, “Vision-Based Hand Gesture Recognition,” in 2013 International Conference on Service Sciences (ICSS), IEEE, Apr. 2013, pp. 260–265. doi: 10.1109/ICSS.2013.40.
S. Waheed, R. Amin, J. Iqbal, M. Hussain, and M. A. Bashir, “An Automated Human Action Recognition and Classification Framework Using Deep Learning,” in 2023 4th International Conference on Computing, Mathematics and Engineering Technologies (iCoMET), IEEE, Mar. 2023, pp. 1–5. doi: 10.1109/iCoMET57998.2023.10099190.
Z. Sun, Q. Ke, H. Rahmani, M. Bennamoun, G. Wang, and J. Liu, “Human Action Recognition From Various Data Modalities: A Review,” IEEE Trans Pattern Anal Mach Intell, pp. 1–20, 2022, doi: 10.1109/TPAMI.2022.3183112.
D. Liang and E. Thomaz, “Audio-Based Activities of Daily Living (ADL) Recognition with Large-Scale Acoustic Embeddings from Online Videos,” Proc ACM Interact Mob Wearable Ubiquitous Technol, vol. 3, no. 1, pp. 1–18, Mar. 2019, doi: 10.1145/3314404.
D. Ganesh, R. R. Teja, C. D. Reddy, and D. Swathi, “Human Action Recognition based on Depth maps, Skeleton and Sensor Images using Deep Learning,” in 2022 IEEE 3rd Global Conference for Advancement in Technology (GCAT), IEEE, Oct. 2022, pp. 1–8. doi: 10.1109/GCAT55367.2022.9971982.
M. DALLEL, V. HAVARD, D. BAUDRY, and X. SAVATIER, “InHARD - Industrial Human Action Recognition Dataset in the Context of Industrial Collaborative Robotics,” in 2020 IEEE International Conference on Human-Machine Systems (ICHMS), IEEE, Sep. 2020, pp. 1–6. doi: 10.1109/ICHMS49158.2020.9209531.
J.-S. Kim, “Efficient Human Action Recognition with Dual-Action Neural Networks for Virtual Sports Training,” in 2022 IEEE International Conference on Consumer Electronics-Asia (ICCE-Asia), IEEE, Oct. 2022, pp. 1–3. doi: 10.1109/ICCE-Asia57006.2022.9954758.
P. Le Noury, R. Polman, M. Maloney, and A. Gorman, “A Narrative Review of the Current State of Extended Reality Technology and How it can be Utilised in Sport,” Sports Medicine, vol. 52, no. 7, pp. 1473–1489, Jul. 2022, doi: 10.1007/s40279-022-01669-0.
N. Jaouedi, N. Boujnah, and M. S. Bouhlel, “A new hybrid deep learning model for human action recognition,” Journal of King Saud University - Computer and Information Sciences, vol. 32, no. 4, pp. 447–453, May 2020, doi: 10.1016/j.jksuci.2019.09.004.
H.-H. Pham, L. Khoudour, A. Crouzil, P. Zegers, and S. A. Velastin, “Exploiting deep residual networks for human action recognition from skeletal data,” Computer Vision and Image Understanding, vol. 170, pp. 51–66, May 2018, doi: 10.1016/j.cviu.2018.03.003.
Z. Cao, T. Simon, S.-E. Wei, and Y. Sheikh, “Realtime Multi-person 2D Pose Estimation Using Part Affinity Fields,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Jul. 2017, pp. 1302–1310. doi: 10.1109/CVPR.2017.143.
K. Yun, J. Honorio, D. Chattopadhyay, T. L. Berg, and D. Samaras, “Two-person interaction detection using body-pose features and multiple instance learning,” in 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, IEEE, Jun. 2012, pp. 28–35. doi: 10.1109/CVPRW.2012.6239234.
R. A. Guler, N. Neverova, and I. Kokkinos, “DensePose: Dense Human Pose Estimation in the Wild,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Jun. 2018, pp. 7297–7306. doi: 10.1109/CVPR.2018.00762.
Ho Yub Jung, Soochahn Lee, Yong Seok Heo, and Il Dong Yun, “Random tree walk toward instantaneous 3D human pose estimation,” in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Jun. 2015, pp. 2467–2474. doi: 10.1109/CVPR.2015.7298861.
H. C. Altunay and Z. Albayrak, “A hybrid CNN+LSTM-based intrusion detection system for industrial IoT networks,” Engineering Science and Technology, an International Journal, vol. 38, p. 101322, Feb. 2023, doi: 10.1016/j.jestch.2022.101322.
A. Raza et al., “A Hybrid Deep Learning-Based Approach for Brain Tumor Classification,” Electronics (Basel), vol. 11, no. 7, p. 1146, Apr. 2022, doi: 10.3390/electronics11071146.
R. Tandon, S. Agrawal, A. Chang, and S. S. Band, “VCNet: Hybrid Deep Learning Model for Detection and Classification of Lung Carcinoma Using Chest Radiographs,” Front Public Health, vol. 10, Jun. 2022, doi: 10.3389/fpubh.2022.894920.
V. Hnamte, H. Nhung-Nguyen, J. Hussain, and Y. Hwa-Kim, “A Novel Two-Stage Deep Learning Model for Network Intrusion Detection: LSTM-AE,” IEEE Access, vol. 11, pp. 37131–37148, 2023, doi: 10.1109/ACCESS.2023.3266979.
Mst. A. Khatun et al., “Deep CNN-LSTM With Self-Attention Model for Human Activity Recognition Using Wearable Sensor,” IEEE J Transl Eng Health Med, vol. 10, pp. 1–16, 2022, doi: 10.1109/JTEHM.2022.3177710.
B. Lindemann, B. Maschler, N. Sahlab, and M. Weyrich, “A survey on anomaly detection for technical systems using LSTM networks,” Comput Ind, vol. 131, p. 103498, Oct. 2021, doi: 10.1016/j.compind.2021.103498.
A. S. Musleh, G. Chen, Z. Y. Dong, C. Wang, and S. Chen, “Attack Detection in Automatic Generation Control Systems using LSTM-Based Stacked Autoencoders,” IEEE Trans Industr Inform, vol. 19, no. 1, pp. 153–165, Jan. 2023, doi: 10.1109/TII.2022.3178418.
E. Mushtaq, A. Zameer, M. Umer, and A. A. Abbasi, “A two-stage intrusion detection system with auto-encoder and LSTMs,” Appl Soft Comput, vol. 121, p. 108768, May 2022, doi: 10.1016/j.asoc.2022.108768.
M. Mahmoud, M. Kasem, A. Abdallah, and H. S. Kang, “AE-LSTM: Autoencoder with LSTM-Based Intrusion Detection in IoT,” in 2022 International Telecommunications Conference (ITC-Egypt), IEEE, Jul. 2022, pp. 1–6. doi: 10.1109/ITC-Egypt55520.2022.9855688.
Mst. A. Khatun et al., “Deep CNN-LSTM With Self-Attention Model for Human Activity Recognition Using Wearable Sensor,” IEEE J Transl Eng Health Med, vol. 10, pp. 1–16, 2022, doi: 10.1109/JTEHM.2022.3177710.
C. Vondrick, H. Pirsiavash, and A. Torralba, “Anticipating Visual Representations from Unlabeled Video,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Jun. 2016, pp. 98–106. doi: 10.1109/CVPR.2016.18.
M. Ziaeefard, R. Bergevin, and L.-P. Morency, “Time-slice Prediction of Dyadic Human Activities,” in Proceedings of the British Machine Vision Conference 2015, British Machine Vision Association, 2015, pp. 167.1-167.13. doi: 10.5244/C.29.167.
M. S. Ryoo, “Human activity prediction: Early recognition of ongoing activities from streaming videos,” in 2011 International Conference on Computer Vision, IEEE, Nov. 2011, pp. 1036–1043. doi: 10.1109/ICCV.2011.6126349.
F. J. Rendón-Segador, J. A. Álvarez-García, J. L. Salazar-González, and T. Tommasi, “CrimeNet: Neural Structured Learning using Vision Transformer for violence detection,” Neural Networks, vol. 161, pp. 318–329, Apr. 2023, doi: 10.1016/j.neunet.2023.01.048.
M. A. B. Abbass and H.-S. Kang, “Violence Detection Enhancement by Involving Convolutional Block Attention Modules Into Various Deep Learning Architectures: Comprehensive Case Study for UBI-Fights Dataset,” IEEE Access, vol. 11, pp. 37096–37107, 2023, doi: 10.1109/ACCESS.2023.3267409.
C. Leng, Q. Ding, C. Wu, and A. Chen, “Augmented two-stream network for robust action recognition adaptive to various action videos,” J Vis Commun Image Represent, vol. 81, p. 103344, Nov. 2021, doi: 10.1016/j.jvcir.2021.103344.
M. Z. Zaheer, A. Mahmood, H. Shin, and S.-I. Lee, “A Self-Reasoning Framework for Anomaly Detection Using Video-Level Labels,” IEEE Signal Process Lett, vol. 27, pp. 1705–1709, 2020, doi: 10.1109/LSP.2020.3025688.
Downloads
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.