Exploring Object Detection Algorithms and implementation of YOLOv7 and YOLOv8 based model for weapon detection
Keywords:
anchor box, classifier, feature map, RCNN, YOLOAbstract
This paper explores the working principles, performance metrics, and architectural nuances of various object detection techniques, focusing mainly on the YOLO (You Only Look Once) algorithms. A comprehensive comparative study is conducted, considering factors such as loss function, backbone network, and performance on standardized image sizes. Beginning with an introduction, the paper classifies object detection algorithms, into one-stage and two-stage object detection techniques. The literature review scrutinizes the operational mechanisms and constraints of existing techniques. This study then transitions into weapon detection using the YOLOv7 and YOLOv8 algorithms, leveraging a dataset sourced and pre-processed from the Roboflow website. The mean Average Precision (mAP) achieved by YOLOv7 and YOLOv8 after training for 50 epochs stands at 0.9289 and 0.9430, respectively. Furthermore, the paper elucidates how performance metrics fluctuate with respect to epoch count in YOLOv7. In conclusion, the paper outlines avenues for further research, highlighting areas that warrant attention and exploration within the realm of object detection methodologies.
Downloads
References
D. G. Lowe, “Object recognition from local scale-invariant features,” in Proceedings of the Seventh IEEE International Conference on Computer Vision, 1999, pp. 1150–1157 vol.2. doi: 10.1109/ICCV.1999.790410.
P. Viola and M. Jones, “Rapid object detection using a boosted cascade of simple features,” in Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, 2001, pp. I–I. doi: 10.1109/CVPR.2001.990517.
P. Felzenszwalb, D. McAllester, and D. Ramanan, “A discriminatively trained, multiscale, deformable part model,” in 2008 IEEE Conference on Computer Vision and Pattern Recognition, 2008, pp. 1–8. doi: 10.1109/CVPR.2008.4587597.
N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), 2005, pp. 886–893 vol. 1. doi: 10.1109/CVPR.2005.177.
A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” in Advances in Neural Information Processing Systems, F. Pereira, C. J. Burges, L. Bottou, and K. Q. Weinberger, Eds., Curran Associates, Inc., 2012. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf
R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Region-Based Convolutional Networks for Accurate Object Detection and Segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 38, no. 1, pp. 142–158, Jan. 2016, doi: 10.1109/TPAMI.2015.2437384.
R. Girshick, “Fast R-CNN,” Proc. IEEE Int. Conf. Comput. Vis., vol. 2015 Inter, pp. 1440–1448, 2015, doi: 10.1109/ICCV.2015.169.
S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 6, pp. 1137–1149, Jun. 2017, doi: 10.1109/TPAMI.2016.2577031.
K. He, G. Gkioxari, P. Dollar, and R. Girshick, “Mask R-CNN,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 2, pp. 386–397, Feb. 2020, doi: 10.1109/TPAMI.2018.2844175.
Z. Cai and N. Vasconcelos, “Cascade R-CNN: High quality object detection and instance segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, no. 5, pp. 1483–1498, 2021, doi: 10.1109/TPAMI.2019.2956516.
J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You Only Look Once: Unified, Real-Time Object Detection,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Jun. 2016, pp. 779–788. doi: 10.1109/CVPR.2016.91.
J. Redmon and A. Farhadi, “YOLO9000: Better, faster, stronger,” Proc. - 30th IEEE Conf. Comput. Vis. Pattern Recognition, CVPR 2017, vol. 2017-Janua, pp. 6517–6525, 2017, doi: 10.1109/CVPR.2017.690.
J. Redmon and A. Farhadi, “YOLOv3: An Incremental Improvement,” Apr. 2018.
A. Bochkovskiy, C.-Y. Wang, and H. Liao, YOLOv4: Optimal Speed and Accuracy of Object Detection. 2020.
C. Wang, A. Bochkovskiy, and H. M. Liao, “Scaled-YOLOv4 : Scaling Cross Stage Partial Network,” pp. 13029–13038.
“Ultralytics YOLOv5 Architecture,” 2023. https://docs.ultralytics.com/yolov5/tutorials/architecture_description/#1-model-structure (accessed Jul. 19, 2023).
Z. Zheng, P. Wang, W. Liu, J. Li, R. Ye, and D. Ren, “Distance-IoU loss: Faster and better learning for bounding box regression,” AAAI 2020 - 34th AAAI Conf. Artif. Intell., no. 2, pp. 12993–13000, 2020, doi: 10.1609/aaai.v34i07.6999.
C. Li et al., “YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications”.
C.-Y. Wang, A. Bochkovskiy, and H. Liao, YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. 2022. doi: 10.48550/arXiv.2207.02696.
J. Woo, J. Baek, S. Jo, and S. Y. Kim, “A Study on Object Detection Performance of YOLOv4 for Autonomous Driving of Tram,” 2022.
J. Terven, “A Comprehensive Review of YOLO Architectures in Computer Vision: From YOLOv1 to YOLOv8 and YOLO-NAS,” Mach. Learn. Knowl. Extr., vol. 5, no. 4, pp. 1680–1716, 2023, doi: 10.3390/make5040083.
H. Lou et al., “DC-YOLOv8: Small-Size Object Detection Algorithm Based on Camera Sensor,” Electron., vol. 12, no. 10, 2023, doi: 10.3390/electronics12102323.
R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., pp. 580–587, 2014, doi: 10.1109/CVPR.2014.81.
J. Terven and D. Cordova-Esparza, “A Comprehensive Review of YOLO: From YOLOv1 and Beyond,” pp. 1–33, 2023, [Online]. Available: http://arxiv.org/abs/2304.00501
C.-Y. Wang, A. Bochkovskiy, and H.-Y. M. Liao, “Scaled-YOLOv4: Scaling Cross Stage Partial Network,” in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 13024–13033. doi: 10.1109/CVPR46437.2021.01283.
C.-Y. Wang, I.-H. Yeh, and H. Liao, “You Only Learn One Representation: Unified Network for Multiple Tasks,” J. Inf. Sci. Eng., vol. 39, pp. 691–709, 2021.
Z. Ge, “YOLOX: Exceeding YOLO Series in 2021,” pp. 1–7, 2021.
Testing, “guns Computer Vision Project,” Roboflow, 2022. https://universe.roboflow.com/testing-kfsrv/guns-l4rap
M. Kisantal, Z. Wojna, J. Murawski, J. Naruniec, and K. Cho, “Augmentation for small object detection,” Feb. 2019.
M. Khodabandeh, A. Vahdat, M. Ranjbar, and W. G. Macready, “A Robust Learning Approach to Domain Adaptive Object Detection,” Apr. 2019.
A. Wang, Y. Sun, A. Kortylewski, and A. Yuille, “Robust Object Detection Under Occlusion With Context-Aware CompositionalNets,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Jun. 2020, pp. 12642–12651. doi: 10.1109/CVPR42600.2020.01266.
A. Serban and E. Poll, “Adversarial Examples on Object Recognition :,” vol. 53, no. 3, 2020, doi: 10.1145/3398394.
[33] C. Shorten and T. M. Khoshgoftaar, “A survey on Image Data Augmentation for Deep Learning,” J. Big Data, vol. 6, no. 1, 2019, doi: 10.1186/s40537-019-0197-0.
E. Arulprakash and M. Aruldoss, “A study on generic object detection with emphasis on future research directions,” J. King Saud Univ. - Comput. Inf. Sci., vol. 34, no. 9, pp. 7347–7365, 2022, doi: 10.1016/j.jksuci.2021.08.001.
Downloads
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.