Unveiling the Resilience of Image Captioning Models and the Influence of Pre-trained Models on Deep Learning Performance
Keywords:
Convolutional Neural Networks, Recurrent Neural Network, Image Captioning, Computer Vision, Deep LearningAbstract
Image captioning presents a difficult challenge in the fields of computer vision and natural language processing, as it requires the generation of a descriptive text sentence for a given image. Recently, deep learning approaches have shown promising results in this area, with encoder-decoder models being widely adopted. This research paper introduces a unique deep learning approach to image captioning, which combines convolutional neural networks (CNNs) and recurrent neural networks (RNNs). The proposed method aims to generate captions for images. Our proposed method utilizes a pre-trained CNN To extract image features, the approach involves feeding them into an RNN-based language model, which subsequently generates corresponding captions. We evaluated our approach on a benchmark dataset namely Flickr 8k, and achieved state-of-the-art results in terms of BLEU scores. We also conducted a thorough analysis of our approach and demonstrated its effectiveness in generating accurate and diverse captions for images. Overall, our proposed approach presents a significant advancement in image captioning using deep learning and holds promise for numerous applications, including image retrieval and assistive technology for the visually impaired.
Downloads
References
Shuang Liu, Liang Bai, Yanli Hu and Haoran Wang, “Image Captioning Based on Deep Neural Networks,” MATEC Web of Conferences 232,01052 (2018), https://doi.org/10.1051/matecconf/201823201052
Kelvin Xu, Jimmy Lei Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Ruslan Salakhutdinov, Richard S. Zemel, Yoshua Bengio, “ Show, Attend and Tell: Neural Image CaptionGeneration with Visual Attention” arXiv:1502.03044v3 [cs.LG] 19 Apr 2016.
Oriol Vinyals, Alexander Toshev, Samy Bengio, “Show and Tell: A Neural Image Caption Generator”, arXiv:1411.4555v2, [cs.CV] 20 Apr 2015.
Muhammad Abdelhadie, Al‑Malla, Assef Jafar1 and Nada Ghneim, “Image captioning model using attention and object features to mimic human image understanding”, Journal of Big Data (2022) 9:20 https://doi.org/10.1186/s40537-022-00571-w
Simao Herdade, Armin Kappeler, Kofi Boakye, Joao Soares, “Image Captioning: Transforming Objects into Words,” https://arxiv.org/abs/1906.05963
Chaoyang Wang, Ziwei Zhou1, Liang Xu1, “An Integrative Review of Image Captioning Research,” 2021 J.Phys.: Conf. Ser. 1748 042060 doi:10.1088/1742-6596/1748/4/042060
Aishwarya Maroju, Sneha Sri Doma, Lahari Chandarlapati, “Image Caption Generating Deep Learning Model”, International Journal of Engineering Research & Technology (IJERT) Vol. 10 Issue 09, September-2021
Lakshminarasimhan Srinivasan1, Dinesh Sreekanthan, Amutha A.L, “Image Captioning - A Deep Learning Approach”, International Journal of Applied Engineering Research ISSN 0973-4562 Volume 13, Number 9 (2018)
MD. Zakir Hossain, Ferdous Sohel, Mohd Fairuz Shiratuddin and Hamid Laga, “A Comprehensive Survey of Deep Learning for Image Captioning,” arXiv:1810.04020v2 [cs.CV] 14 Oct 2018.
Akash Verma, Arun Kumar Yadav, Mohit Kumar, Divakar Yadav,“Automatic Image Caption Generation Using Deep Learning,” https://doi.org/10.21203/rs.3.rs-1282936/v1 June 21st, 2022
Grishma Sharma, “Visual Image Caption Generator Using Deep Learning,” SSRN Electronic Journal · January 2019 DOI: 10.2139/ssrn.3368837
Takashi Miyazaki, Nobuyuki Shimizu , “Cross-Lingual Image Caption Generation”, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, pages 1780–1790, Berlin, Germany, August 7-12, 2016
Jianhui Chen , Wenqiang Dong, “Image Caption Generator Based On Deep Neural Networks,” https://www.math.ucla.edu/~minchen/doc/ImgCapGen.pdf
Jagroop Kaur, Gurpreet Singh Josan, “English to Hindi MultiModal Image Caption Translation,” Journal of Scientific Research · January 2020 DOI: 10.37398/JSR.2020.640238
BLEU: A Method for Automatic Evaluation of Machine Translation Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu IBM T. J. Watson Research Center Yorktown Heights, NY 10598, USA
Ahammad, D. S. H. ., & Yathiraju, D. . (2021). Maternity Risk Prediction Using IOT Module with Wearable Sensor and Deep Learning Based Feature Extraction and Classification Technique. Research Journal of Computer Systems and Engineering, 2(1), 40:45. Retrieved from https://technicaljournals.org/RJCSE/index.php/journal/article/view/19
Anand, R., Ahamad, S., Veeraiah, V., Janardan, S. K., Dhabliya, D., Sindhwani, N., & Gupta, A. (2023). Optimizing 6G wireless network security for effective communication. Innovative smart materials used in wireless communication technology (pp. 1-20) doi:10.4018/978-1-6684-7000- 8.ch001 Retrieved from www.scopus.com
Downloads
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.