Automated Image Captioning Using Deep Learning

Authors

  • Chandra B., Avinash P., Sai Prasath P., Jennet Shinny D., Keshav Adhitya M.

Keywords:

Object Detection, Deep Learning, Computer Vision, YOLOv3, Convolutional Neural Networks

Abstract

Object detection, pivotal in computer vision, spans diverse applications like autonomous driving, medical imaging, etc. Deep learning, notably, enhances detection by hierarchically representing data. Two prevalent approaches are region proposal-based (e.g., R-CNN, Fast R-CNN) and unified pipeline-based (e.g., YOLOv2). The latter, exemplified by YOLOv2, emphasizes speed and simplicity. Innovations like batch normalization and anchor boxes refine accuracy. Variants like real-time YOLO adapt for specific platforms (e.g., Non-GPU computers), while methods like SSD and DSSD optimize speed and accuracy trade-offs. Recent advancements include YOLOv3's binary cross-entropy loss for improved small object detection

Downloads

Download data is not yet available.

References

.A. Krizhevsky, I. Sutskever, and G. E.Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” NIPS, 2012, doi: 10.1201/9781420010749.

R. L. Galvez, A. A. Bandala, E. P. Dadios, R. R. P. Vicerra, and J. M. Z. Maningo, “Object Detection Using Convolutional Neural Networks,” IEEE Reg. 10 Annu. Int. Conf. Proceedings/TENCON, vol. 2018- October, no. October, pp. 2023–2027, 2019, doi: 10.1109/TENCON.2018.8650517.

R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., pp. 580–587, 2014, doi: 10.1109/CVPR.2014.81.

R. Girshick, “Fast R-CNN,” Proc. IEEE Int. Conf. Comput. Vis., vol. 2015 International Conference on Computer Vision, ICCV 2015, pp. 1440–1448, 2015, doi: 10.1109/ICCV.2015.169.

S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards RealTime Object Detection with Region Proposal Networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 6, pp. 1137–1149, 2017, doi: 10.1109/TPAMI.2016.2577031.

P. Dong and W. Wang, “Better region proposals for pedestrian detection with R-CNN,” 30th Anniv. Vis. Commun. Image Process., pp. 3–6, 2016, doi: 10.1109/VCIP.2016.7805452.

W. Liu, D. Anguelov, D. Erhan, and C. Szegedy, “SSD: Single Shot MultiBox Detector,” ECCV, vol. 1, pp. 21–37, 2016, doi: 10.1007/978- 3-319-46448-0.

J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real- time object detection,” IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., vol. 2016-Decem, pp. 779–788, 2016, doi: 10.1109/CVPR.2016.91.

J. Redmon and A. Farhadi, “YOLO9000: Better, faster, stronger,” 30th IEEE Conf. Comput. Vis. Pattern Recognition, CVPR, vol. 2017-Janua, pp. 6517–6525, 2017, doi: 10.1109/CVPR.2017.690.

J. Redmon and A. Farhadi, “YOLOv3: An Incremental Improvement,” arXiv Prepr., 2018.

Ding, F. Long, H. Fan, L. Liu, and Y. Wang, “A novel YOLOv3-tiny network for unmanned airship obstacle detection,” IEEE 8th Data Driven Control Learn. Syst. Conf. DDCLS, pp. 277–281, 2019, doi: 10.1109/DDCLS.2019.8908875.

N. Dalal and B. Triggs, “Histograms of Oriented Gradients for Human Detection,” IEEE CVPR, vol. 1, pp. 886–893, 2005, doi: 10.1109/CVPR.2005.177.

C. Szegedy, W. Liu, Y. Jia, and P. Sermanet, “Going Deeper with Convolutions,” CVPR, 2015, doi: 10.1108/978-1-78973-723- 320191012.

J. R. R. Uijlings, K. E. A. Van De Sande, T. Gevers, and A. W. M. Smeulders, “Selective search for object recognition,” Int. J. Comput. Vis., vol. 104, no. 2, pp. 154–171, 2013, doi: 10.1007/s11263-013-0620- 5.

Z. Q. Zhao, P. Zheng, S. T. Xu, and X. Wu, “Object Detection with Deep Learning: A Review,” IEEE Trans. Neural Networks Learn. Syst., vol. 30, no. 11, pp. 3212–3232, 2019, doi: 10.1109/TNNLS.2018.2876865.

K. He, X. Zhang, S. Ren, and J. Sun, “Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition,” ECCV, pp. 346–361, 2014, doi: 10.1023/B:KICA.0000038074.96200.69.

R. Nabati and H. Qi, “RRPN : RADAR REGION PROPOSAL NETWORK FOR OBJECT DETECTION IN AUTONOMOUS VEHICLES,” IEEE Int. Conf. Image Process., pp. 3093–3097, 2019.

L. Jiao et al., “A Survey of Deep Learning-Based Object Detection,” IEEE Access, vol. 7, pp. 128837–128868, 2019, doi: 10.1109/access.2019.2939201.

D. Wang, C. Li, S. Wen, X. Chang, S. Nepal, and Y. Xiang, “Daedalus: Breaking Non-Maximum Suppression in Object Detection via Adversarial Examples,” arXiv Prepr., 2019.

C. Ning, H. Zhou, Y. Song, and J. Tang, “Inception Single Shot MultiBox Detector for object detection,” IEEE Int. Conf. Multimed. Expo Work. ICMEW, no. July, pp. 549–554, 2017, doi: 10.1109/ICMEW.2017.8026312.

Z. Chen, R. Khemmar, B. Decoux, A. Atahouet, and J. Y. Ertaud, “Real time object detection, tracking, and distance and motion estimation based on deep learning: Application to smart mobility,” 8th Int. Conf. Emerg. Secur. Technol. EST, pp. 1–6, 2019, doi: 10.1109/EST.2019.8806222.

D. Xiao, F. Shan, Z. Li, B. T. Le, X. Liu, and X. Li, “A Target Detection Model Based on Improved Tiny-Yolov3 Under the Environment of Mining Truck,” IEEE Access, vol. 7, pp. 123757–123764, 2019, doi: 10.1109/access.2019.2928603.

Q. C. Mao, H. M. Sun, Y. B. Liu, and R. S. Jia, “Mini-YOLOv3: RealTime Object Detector for Embedded Applications,” IEEE Access, vol. 7, pp. 133529–133538, 2019, doi: 10.1109/ACCESS.2019.2941547.

W. Fang, L. Wang, and P. Ren, “Tinier-YOLO: A Real-time Object Detection Method for Constrained Environments,” IEEE Access, vol. 8, pp. 1935–1944, 2019, doi10.1109/ACCESS.2019.2961959.

R. Huang, J. Pedoeem, and C. Chen, “YOLO-LITE: A Real-Time Object Detection Algorithm Optimized for Non-GPU Computers,” IEEE Int. Conf. Big Data, Big Data, pp. 2503–2510, 2019, doi: 10.1109/BigData.2018.8621865.

Downloads

Published

27.03.2024

How to Cite

Avinash P., Sai Prasath P., Jennet Shinny D., Keshav Adhitya M., . C. B. (2024). Automated Image Captioning Using Deep Learning . International Journal of Intelligent Systems and Applications in Engineering, 12(3), 1418–1421. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/5534

Issue

Section

Research Article