Unveiling the Resilience of Image Captioning Models and the Influence of Pre-trained Models on Deep Learning Performance

Authors

  • Namrata Kharate Assistant Professor, Department of Computer Engineering ,Vishwakarma Institute Of Information Technology, Pune, India.
  • Sanket Patil Student, Department of Computer Engineering, Vishwakarma Institute Of Information Technology, Pune, India.
  • Pallavi Shelke Student, Department of Computer Engineering, Vishwakarma Institute Of Information Technology, Pune, India.
  • Gitanjali Shinde Assistant Professor, Department of Computer Engineering,Vishwakarma Institute Of Information Technology, Pune, India.
  • Parikshit Mahalle Professor, Department of Artificial Intelligence and Data Science,Vishwakarma Institute Of Information Technology, Pune, India
  • Nilesh Sable Assistant Professor, Department of Information Technology, Vishwakarma Institute Of Information Technology, Pune, India.
  • Pranali G. Chavhan Assistant Professor, Department of Computer Engineering, Vishwakarma Institute Of Information Technology, Pune, India.

Keywords:

Convolutional Neural Networks, Recurrent Neural Network, Image Captioning, Computer Vision, Deep Learning

Abstract

Image captioning presents a difficult challenge in the fields of computer vision and natural language processing, as it requires the generation of a descriptive text sentence for a given image. Recently, deep learning approaches have shown promising results in this area, with encoder-decoder models being widely adopted. This research paper introduces a unique deep learning approach to image captioning, which combines convolutional neural networks (CNNs) and recurrent neural networks (RNNs). The proposed method aims to generate captions for images. Our proposed method utilizes a pre-trained CNN To extract image features, the approach involves feeding them into an RNN-based language model, which subsequently generates corresponding captions. We evaluated our approach on a benchmark dataset namely  Flickr 8k, and achieved state-of-the-art results in terms of BLEU  scores. We also conducted a thorough analysis of our approach and demonstrated its effectiveness in generating accurate and diverse captions for images. Overall, our proposed approach presents a significant advancement in image captioning using deep learning and holds promise for numerous applications, including image retrieval and assistive technology for the visually impaired.

Downloads

Download data is not yet available.

References

Shuang Liu, Liang Bai, Yanli Hu and Haoran Wang, “Image Captioning Based on Deep Neural Networks,” MATEC Web of Conferences 232,01052 (2018), https://doi.org/10.1051/matecconf/201823201052

Kelvin Xu, Jimmy Lei Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Ruslan Salakhutdinov, Richard S. Zemel, Yoshua Bengio, “ Show, Attend and Tell: Neural Image CaptionGeneration with Visual Attention” arXiv:1502.03044v3 [cs.LG] 19 Apr 2016.

Oriol Vinyals, Alexander Toshev, Samy Bengio, “Show and Tell: A Neural Image Caption Generator”, arXiv:1411.4555v2, [cs.CV] 20 Apr 2015.

Muhammad Abdelhadie, Al‑Malla, Assef Jafar1 and Nada Ghneim, “Image captioning model using attention and object features to mimic human image understanding”, Journal of Big Data (2022) 9:20 https://doi.org/10.1186/s40537-022-00571-w

Simao Herdade, Armin Kappeler, Kofi Boakye, Joao Soares, “Image Captioning: Transforming Objects into Words,” https://arxiv.org/abs/1906.05963

Chaoyang Wang, Ziwei Zhou1, Liang Xu1, “An Integrative Review of Image Captioning Research,” 2021 J.Phys.: Conf. Ser. 1748 042060 doi:10.1088/1742-6596/1748/4/042060

Aishwarya Maroju, Sneha Sri Doma, Lahari Chandarlapati, “Image Caption Generating Deep Learning Model”, International Journal of Engineering Research & Technology (IJERT) Vol. 10 Issue 09, September-2021

Lakshminarasimhan Srinivasan1, Dinesh Sreekanthan, Amutha A.L, “Image Captioning - A Deep Learning Approach”, International Journal of Applied Engineering Research ISSN 0973-4562 Volume 13, Number 9 (2018)

MD. Zakir Hossain, Ferdous Sohel, Mohd Fairuz Shiratuddin and Hamid Laga, “A Comprehensive Survey of Deep Learning for Image Captioning,” arXiv:1810.04020v2 [cs.CV] 14 Oct 2018.

Akash Verma, Arun Kumar Yadav, Mohit Kumar, Divakar Yadav,“Automatic Image Caption Generation Using Deep Learning,” https://doi.org/10.21203/rs.3.rs-1282936/v1 June 21st, 2022

Grishma Sharma, “Visual Image Caption Generator Using Deep Learning,” SSRN Electronic Journal · January 2019 DOI: 10.2139/ssrn.3368837

Takashi Miyazaki, Nobuyuki Shimizu , “Cross-Lingual Image Caption Generation”, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, pages 1780–1790, Berlin, Germany, August 7-12, 2016

Jianhui Chen , Wenqiang Dong, “Image Caption Generator Based On Deep Neural Networks,” https://www.math.ucla.edu/~minchen/doc/ImgCapGen.pdf

Jagroop Kaur, Gurpreet Singh Josan, “English to Hindi MultiModal Image Caption Translation,” Journal of Scientific Research · January 2020 DOI: 10.37398/JSR.2020.640238

BLEU: A Method for Automatic Evaluation of Machine Translation Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu IBM T. J. Watson Research Center Yorktown Heights, NY 10598, USA

Ahammad, D. S. H. ., & Yathiraju, D. . (2021). Maternity Risk Prediction Using IOT Module with Wearable Sensor and Deep Learning Based Feature Extraction and Classification Technique. Research Journal of Computer Systems and Engineering, 2(1), 40:45. Retrieved from https://technicaljournals.org/RJCSE/index.php/journal/article/view/19

Anand, R., Ahamad, S., Veeraiah, V., Janardan, S. K., Dhabliya, D., Sindhwani, N., & Gupta, A. (2023). Optimizing 6G wireless network security for effective communication. Innovative smart materials used in wireless communication technology (pp. 1-20) doi:10.4018/978-1-6684-7000- 8.ch001 Retrieved from www.scopus.com

Downloads

Published

12.07.2023

How to Cite

Kharate, N. ., Patil, S. ., Shelke, P. ., Shinde, G. ., Mahalle, P. ., Sable, N. ., & Chavhan, P. G. (2023). Unveiling the Resilience of Image Captioning Models and the Influence of Pre-trained Models on Deep Learning Performance. International Journal of Intelligent Systems and Applications in Engineering, 11(9s), 01–07. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/3089

Issue

Section

Research Article