Challenges and a Novel Approach for Image Captioning Using Neural Network and Searching Techniques

Authors

  • Bharati Dixit School of Computer Engineering and Technology Dr. Vishwanath Karad MIT World Peace University, Pune, India
  • Rajendra G. Pawar Associate Professor, Department of Computer Science Engineering, MIT School of Computing, MIT ADT University, Pune, India
  • Milind Gayakwad Assistant Professor, Bharati Vidyapeeth Deemed to be University College of Engineering, Pune, India
  • Rahul Joshi Associate Professor, Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune, India
  • Ansh Mahajan School of Computer Engineering and Technology Dr. Vishwanath Karad MIT World Peace University, Pune, India
  • 6Suyash V. Chinchmalatpure School of Computer Engineering and Technology Dr. Vishwanath Karad MIT World Peace University, Pune, India

Keywords:

Beam search, computer vision, convolutional neural networks, Flickr8k dataset, image captioning, long short-term memory models, natural language processing

Abstract

Generating natural language descriptions of images is a difficult challenge in computer vision and natural language processing known as image captioning. Despite major advancements in recent years, there are still difficulties in image captioning, such as managing uncommon terms, coming up with unique and inventive captions, and dealing with long-term dependencies. In this paper, we provide a unique method for picture captioning that overcomes these difficulties by combining long short-term memory (LSTM) models with convolutional neural networks (CNNs). We employ an LSTM to create captions based on the attributes that a pre-trained CNN has extracted from the input image. We use a beam search method with a penalty term for creating unusual words to address the problem of rare words. We test our methodology using the Flickr8k dataset, and our model surpasses cutting-edge techniques in terms of caption quality and variety. Our method has applications in image retrieval, visual question answering, and picture captioning, among other areas. Overall, our approach offers a viable path forward for developing AI-based Image captioning.

Downloads

Download data is not yet available.

References

S. Amirian, K. Rasheed, T. R. Taha and H. R. Arabnia, "Automatic Image and Video Caption Generation With Deep Learning: A Concise Review and Algorithmic Overlap," in IEEE Access, vol. 8, pp. 218386-218400, 2020, doi: 10.1109/ACCESS.2020.3042484.

Y. Ming, N. Hu, C. Fan, F. Feng, J. Zhou and H. Yu, "Visuals to Text: A Comprehensive Review on Automatic Image Captioning," in IEEE/CAA Journal of Automatica Sinica, vol. 9, no. 8, pp. 1339-1365, August 2022, doi: 10.1109/JAS.2022.105734.

C. Amritkar and V. Jabade, "Image Caption Generation Using Deep Learning Technique," 2018 Fourth International Conference on Computing Communication Control and Automation (ICCUBEA), Pune, India, 2018, pp. 1-4, doi: 10.1109/ICCUBEA.2018.8697360.

S. S. YV, Y. Choubey and D. Naik, "Image Captioning with Attention Based Model," 2021 5th International Conference on Computing Methodologies and Communication (ICCMC), Erode, India, 2021, pp. 1051-1055, doi: 10.1109/ICCMC51019.2021.9418347.

O. Sargar and S. Kinger, "Image Captioning Methods and Metrics," 2021 International Conference on Emerging Smart Computing and Informatics (ESCI), Pune, India, 2021, pp. 522-526, doi: 10.1109/ESCI50559.2021.9396839.

S. Sahay, N. Omare and K. K. Shukla, "An Approach to identify Captioning Keywords in an Image using LIME," 2021 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS), Greater Noida, India, 2021, pp. 648-651, doi: 10.1109/ICCCIS51004.2021.9397159.

K. V. Sruthi and M. S. Meharban, "Review on Image Captioning and Speech Synthesis Techniques," 2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS), Coimbatore, India, 2020, pp. 352-356, doi: 10.1109/ICACCS48705.2020.9074468.

V. Wadhwa, B. Gupta and S. Gupta, "AI Based Automated Image Caption Tool Implementation for Visually Impaired," 2021 International Conference on Industrial Electronics Research and Applications (ICIERA), New Delhi, India, 2021, pp. 1-6, doi: 10.1109/ICIERA53202.2021.9726759.

S. C. Gupta, N. R. Singh, T. Sharma, A. Tyagi and R. Majumdar, "Generating Image Captions using Deep Learning and Natural Language Processing," 2021 9th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO), Noida, India, 2021, pp. 1-4, doi: 10.1109/ICRITO51393.2021.9596486.

M. M. A. Baig, M. I. Shah, M. A. Wajahat, N. Zafar and O. Arif, "Image Caption Generator with Novel Object Injection," 2018 Digital Image Computing: Techniques and Applications (DICTA), Canberra, ACT, Australia, 2018, pp. 1-8, doi: 10.1109/DICTA.2018.8615810.

Biswas, R., Barz, M. & Sonntag, D. Towards Explanatory Interactive Image Captioning Using Top-Down and Bottom-Up Features, Beam Search and Re-ranking. Künstl Intell 34, 571–584 (2020). https://doi.org/10.1007/s13218-020-00679-2

S. Takkar, A. Jain and P. Adlakha, "Comparative Study of Different Image Captioning Models," 2021 5th International Conference on Computing Methodologies and Communication (ICCMC), Erode, India, 2021, pp. 1366-1371, doi: 10.1109/ICCMC51019.2021.9418451.

Shrimal, Anubhav and Tanmoy Chakraborty. “Attention Beam: An Image Captioning Approach.” ArXiv abs/2011.01753 (2020): n. Pag.

Kishore papineni, Salim Roukos, Todd Ward, Wei-Jing Zhu.: BLEU: a Method for Automatic Evaluation of Machine Translation Kishore, Proceedings of the 40th Annual Meeting on Association for Computational Linguistics

T. Jaknamon and S. Marukatat, "ThaiTC:Thai Transformer-based Image Captioning," 2022 17th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP), Chiang Mai, Thailand, 2022, pp. 1-4, doi: 10.1109/iSAI-NLP56921.2022.9960246.

Y. Yang, "Image-Caption Pair Replacement Algorithm towards Semi-supervised Novel Object Captioning," 2022 7th International Conference on Intelligent Computing and Signal Processing (ICSP), Xi'an, China, 2022, pp. 266-273, doi: 10.1109/ICSP54964.2022.9778729.

C. Liu, R. Zhao, H. Chen, Z. Zou and Z. Shi, "Remote Sensing Image Change Captioning With Dual-Branch Transformers: A New Method and a Large Scale Dataset," in IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1-20, 2022, Art no. 5633520, doi: 10.1109/TGRS.2022.3218921.

G. Hoxha, F. Melgani and J. Slaghenauffi, "A New CNN-RNN Framework For Remote Sensing Image Captioning," 2020 Mediterranean and Middle-East Geoscience and Remote Sensing Symposium (M2GARSS), Tunis, Tunisia, 2020, pp. 1-4, doi: 10.1109/M2GARSS47143.2020.9105191.

J. Vaishnavi and V. Narmatha, "Video Captioning based on Image Captioning as Subsidiary Content," 2022 Second International Conference on Advances in Electrical, Computing, Communication and Sustainable Technologies (ICAECT), Bhilai, India, 2022, pp. 1-6, doi: 10.1109/ICAECT54875.2022.9807935.

Y. Feng, K. Maeda, T. Ogawa and M. Haseyama, "Human-Centric Image Retrieval with Gaze-Based Image Captioning," 2022 IEEE International Conference on Image Processing (ICIP), Bordeaux, France, 2022, pp. 3828-3832, doi: 10.1109/ICIP46576.2022.9897949.

C. Cai, K. -H. Yap and S. Wang, "Attribute Conditioned Fashion Image Captioning," 2022 IEEE International Conference on Image Processing (ICIP), Bordeaux, France, 2022, pp. 1921-1925, doi: 10.1109/ICIP46576.2022.9897417.

G. Sumbul, S. Nayak and B. Demir, "SD-RSIC: Summarization-Driven Deep Remote Sensing Image Captioning," in IEEE Transactions on Geoscience and Remote Sensing, vol. 59, no. 8, pp. 6922-6934, Aug. 2021, doi: 10.1109/TGRS.2020.3031111.

X. Ye et al., "A Joint-Training Two-Stage Method For Remote Sensing Image Captioning," in IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1-16, 2022, Art no. 4709616, doi: 10.1109/TGRS.2022.3224244.

J. Wang, Z. Chen, A. Ma and Y. Zhong, "Capformer: Pure Transformer for Remote Sensing Image Caption," IGARSS 2022 - 2022 IEEE International Geoscience and Remote Sensing Symposium, Kuala Lumpur, Malaysia, 2022, pp. 7996-7999, doi: 10.1109/IGARSS46834.2022.9883199.

J. -H. Huang, T. -W. Wu, C. -H. H. Yang and M. Worring, "Deep Context-Encoding Network For Retinal Image Captioning," 2021 IEEE International Conference on Image Processing (ICIP), Anchorage, AK, USA, 2021, pp. 3762-3766, doi: 10.1109/ICIP42928.2021.9506803.

D. Beddiar, M. Oussalah and S. Tapio, "Explainability for Medical Image Captioning," 2022 Eleventh International Conference on Image Processing Theory, Tools, and Applications (IPTA), Salzburg, Austria, 2022, pp. 1-6, doi: 10.1109/IPTA54936.2022.9784146.

N. Yu, X. Hu, B. Song, J. Yang and J. Zhang, "Topic-Oriented Image Captioning Based on Order-Embedding," in IEEE Transactions on Image Processing, vol. 28, no. 6, pp. 2743-2754, June 2019, doi: 10.1109/TIP.2018.2889922.

X. Yang, Y. Wang, H. Chen and J. Li, "CSTNET: Enhancing Global-To-Local Interactions for Image Captioning," 2022 IEEE International Conference on Image Processing (ICIP), Bordeaux, France, 2022, pp. 1861-1865, doi: 10.1109/ICIP46576.2022.9897810.

Pawar, R., Ghumbre, S., Deshmukh, R. (2018). Developing an Improvised E-Menu Recommendation System for Customer. In: Sa, P., Bakshi, S., Hatzilygeroudis, I., Sahoo, M. (eds) Recent Findings in Intelligent Computing Techniques . Advances in Intelligent Systems and Computing, vol 708. Springer, Singapore. https://doi.org/10.1007/978-981-10-8636-6_35

R. S. Pawar, S. Nema, D. R. Jawale, K. Joshi, S. Debnath and S. P. Singh, "The Role of Innovative Data Mining Approaches for Analyzing and Estimating the Crop Yield in Agriculture Among Emerging Nations," 2022 2nd International Conference on Advance Computing and Innovative Technologies in Engineering (ICACITE), Greater Noida, India, 2022, pp. 23392342,doi:10.1109/ICACITE53722.2022.9823729..

Beldar, Kavita K., M. D. Gayakwad, and M. K. Beldar. 2016. “Optimizing Analytical Queries on Probabilistic Databases with Unmerged Duplicates Using MapReduce.” Int. J. Innov. Res. Comput. Commun. Eng 4: 9651–59.

Pawar, R., Ghumbre, S., & Deshmukh, R. (2019). Visual Similarity Using Convolution Neural Network over Textual Similarity in Content-Based Recommender System. International Journal of Advanced Science and Technology, 27, 137 - 147.

Beldar, Kavita K., M. D. Gayakwad, Debnath Bhattacharyya, and Tai-Hoon Kim. 2016b. “A Comparative Analysis on Contingence Structured Data Methodologies.” International Journal of Software Engineering and Its Applications 10 (5): 13–22.

S Ranjith, Shreyas, K Pradeep Kumar, R Karthik, “Automatic Border Alert System for Fishermen using GPS and GSM techniques”, Indonesian Journal of Electrical Engineering and Computer Science , Vol 7, No.1, (2017).

Beldar, Miss Menka K., M. D. Gayakwad, and Miss Kavita K. Beldar. 2018. “Altruistic Content Voting System Using Crowdsourcing.” International Journal of Scientific Research and Review 7 (5): 477–86.

M. S. M, S. Das, S. Heble, U. Raj, and R. Karthik, “Internet of Things based Wireless Plant Sensor for Smart Farming,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 10, no. 2, p. 456, May 2018

Beldar, Miss Menka K., M. D. Gayakwad, Miss Kavita K. Beldar, and M. K. Beldar. 2018. “Survey on Classification of Online Reviews Based on Social Networking.” IJFRCSCE 4 (3): 55.

Boukhari, Mahamat Adam, Prof Milnid Gayakwad, and Prof Dr Suhas Patil. 2019. “Survey on Inappropriate Content Detection in Online Social Media.” International Journal of Innovative Research in Science, Engineering and Technology 8 (9): 9297–9302.

Gayakwad, M. D., and B. D. Phulpagar. 2013. “Research Article Review on Various Searching Methodologies and Comparative Analysis for Re-Ranking the Searched Results.” International Journal of Recent Scientific Research 4: 1817–20.

Gayakwad, Milind. 2011. “VLAN Implementation Using Ip over ATM.” Journal of Engineering Research and Studies 2 (4): 186–92.

Gayakwad, Milind, and Suhas Patil. 2020. “Content Modelling for Unbiased Information Analysis.” Libr. Philos. Pract, 1–17.

A. K. Boyat and B. K. Joshi, “A Review Paper: Noise Models in Digital Image Processing,” arXiv:1505.03489 [cs], May 2015.

Omarov, Batyrkhan Sultanovich et.al, "Exploring Image Processing and Image Restoration Techniques," International Journal of Fuzzy Logic and Intelligent Systems, vol. 15, no. 3, pp. 172-179, June 2015.

Gayakwad, Milind, Suhas Patil, Rahul Joshi, Sudhanshu Gonge, and Sandeep Dwarkanath Pande. “Credibility Evaluation of User-Generated Content Using Novel Multinomial Classification Technique.” International Journal on Recent and Innovation Trends in Computing and Communication 10 (2s): 151–57.

Rajendra Pawar et.al,“ Farmer Buddy-Plant Leaf Disease Detection on Android Phone” In International Journal of Research and Analytical Reviews. Vol 6 (2), 874-879

Gayakwad, Milind, Suhas Patil, Amol Kadam, Shashank Joshi, Ketan Kotecha, Rahul Joshi, Sharnil Pandya, et al. 2022. “Credibility Analysis of User-Designed Content Using Machine Learning Techniques.” Applied System Innovation 5 (2): 43.

Harane, Swati T., Gajanan Bhole, and Milind Gayakwad. 2017. “SECURE SEARCH OVER ENCRYPTED DATA TECHNIQUES: SURVEY.” International Journal of Advanced Research in Computer Science 8 (7).

Kavita Shevale, Gajanan Bhole, Milind Gayakwad. 2017. “Literature Review on Probabilistic Threshold Query on Uncertain Data.” International Journal of Current Research and Review 9 (6): 52482–84

Mahamat Adam Boukhari, Milind Gayakwad. 2019. “An Experimental Technique on Fake News Detection in Online Social Media.” International Journal of Innovative Technology and Exploring Engineering (IJITEE) 8 (8S3): 526–30.

Maurya, Maruti, and Milind Gayakwad. 2020. “People, Technologies, and Organizations Interactions in a Social Commerce Era.” In Proceeding of the International Conference on Computer Networks, Big Data and IoT (ICCBI-2018), 836–49. Springer International Publishing.

Milind Gayakwad, B. D. Phulpagar. 2013. “Requirement Specific Search.” IJARCSSE 3 (11): 121.

Panicker, Aishwarya, Milind Gayakwad, Sandeep Vanjale, Pramod Jadhav, Prakash Devale, and Suhas Patil. n.d. “Fake News Detection Using Machine Learning Framework.”

Andrew Hernandez, Stephen Wright, Yosef Ben-David, Rodrigo Costa, David Botha. Risk Assessment and Management with Machine Learning in Decision Science. Kuwait Journal of Machine Learning, 2(3). Retrieved from http://kuwaitjournals.com/index.php/kjml/article/view/196

Talukdar, V., Dhabliya, D., Kumar, B., Talukdar, S. B., Ahamad, S., & Gupta, A. (2022). Suspicious activity detection and classification in IoT environment using machine learning approach. Paper presented at the PDGC 2022 - 2022 7th International Conference on Parallel, Distributed and Grid Computing, 531-535. doi:10.1109/PDGC56933.2022.10053312 Retrieved from www.scopus.com

Andrew Hernandez, Stephen Wright, Yosef Ben-David, Rodrigo Costa, David Botha. Intelligent Decision Making: Applications of Machine Learning in Decision Science. Kuwait Journal of Machine Learning, 2(3). Retrieved from http://kuwaitjournals.com/index.php/kjml/article/view/197

Downloads

Published

16.07.2023

How to Cite

Dixit, B. ., G. Pawar, R. ., Gayakwad, M. ., Joshi, R. ., Mahajan, A. ., & Chinchmalatpure, 6Suyash V. . (2023). Challenges and a Novel Approach for Image Captioning Using Neural Network and Searching Techniques. International Journal of Intelligent Systems and Applications in Engineering, 11(3), 712–720. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/3277

Issue

Section

Research Article