Challenges and a Novel Approach for Image Captioning Using Neural Network and Searching Techniques
Keywords:
Beam search, computer vision, convolutional neural networks, Flickr8k dataset, image captioning, long short-term memory models, natural language processingAbstract
Generating natural language descriptions of images is a difficult challenge in computer vision and natural language processing known as image captioning. Despite major advancements in recent years, there are still difficulties in image captioning, such as managing uncommon terms, coming up with unique and inventive captions, and dealing with long-term dependencies. In this paper, we provide a unique method for picture captioning that overcomes these difficulties by combining long short-term memory (LSTM) models with convolutional neural networks (CNNs). We employ an LSTM to create captions based on the attributes that a pre-trained CNN has extracted from the input image. We use a beam search method with a penalty term for creating unusual words to address the problem of rare words. We test our methodology using the Flickr8k dataset, and our model surpasses cutting-edge techniques in terms of caption quality and variety. Our method has applications in image retrieval, visual question answering, and picture captioning, among other areas. Overall, our approach offers a viable path forward for developing AI-based Image captioning.
Downloads
References
S. Amirian, K. Rasheed, T. R. Taha and H. R. Arabnia, "Automatic Image and Video Caption Generation With Deep Learning: A Concise Review and Algorithmic Overlap," in IEEE Access, vol. 8, pp. 218386-218400, 2020, doi: 10.1109/ACCESS.2020.3042484.
Y. Ming, N. Hu, C. Fan, F. Feng, J. Zhou and H. Yu, "Visuals to Text: A Comprehensive Review on Automatic Image Captioning," in IEEE/CAA Journal of Automatica Sinica, vol. 9, no. 8, pp. 1339-1365, August 2022, doi: 10.1109/JAS.2022.105734.
C. Amritkar and V. Jabade, "Image Caption Generation Using Deep Learning Technique," 2018 Fourth International Conference on Computing Communication Control and Automation (ICCUBEA), Pune, India, 2018, pp. 1-4, doi: 10.1109/ICCUBEA.2018.8697360.
S. S. YV, Y. Choubey and D. Naik, "Image Captioning with Attention Based Model," 2021 5th International Conference on Computing Methodologies and Communication (ICCMC), Erode, India, 2021, pp. 1051-1055, doi: 10.1109/ICCMC51019.2021.9418347.
O. Sargar and S. Kinger, "Image Captioning Methods and Metrics," 2021 International Conference on Emerging Smart Computing and Informatics (ESCI), Pune, India, 2021, pp. 522-526, doi: 10.1109/ESCI50559.2021.9396839.
S. Sahay, N. Omare and K. K. Shukla, "An Approach to identify Captioning Keywords in an Image using LIME," 2021 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS), Greater Noida, India, 2021, pp. 648-651, doi: 10.1109/ICCCIS51004.2021.9397159.
K. V. Sruthi and M. S. Meharban, "Review on Image Captioning and Speech Synthesis Techniques," 2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS), Coimbatore, India, 2020, pp. 352-356, doi: 10.1109/ICACCS48705.2020.9074468.
V. Wadhwa, B. Gupta and S. Gupta, "AI Based Automated Image Caption Tool Implementation for Visually Impaired," 2021 International Conference on Industrial Electronics Research and Applications (ICIERA), New Delhi, India, 2021, pp. 1-6, doi: 10.1109/ICIERA53202.2021.9726759.
S. C. Gupta, N. R. Singh, T. Sharma, A. Tyagi and R. Majumdar, "Generating Image Captions using Deep Learning and Natural Language Processing," 2021 9th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO), Noida, India, 2021, pp. 1-4, doi: 10.1109/ICRITO51393.2021.9596486.
M. M. A. Baig, M. I. Shah, M. A. Wajahat, N. Zafar and O. Arif, "Image Caption Generator with Novel Object Injection," 2018 Digital Image Computing: Techniques and Applications (DICTA), Canberra, ACT, Australia, 2018, pp. 1-8, doi: 10.1109/DICTA.2018.8615810.
Biswas, R., Barz, M. & Sonntag, D. Towards Explanatory Interactive Image Captioning Using Top-Down and Bottom-Up Features, Beam Search and Re-ranking. Künstl Intell 34, 571–584 (2020). https://doi.org/10.1007/s13218-020-00679-2
S. Takkar, A. Jain and P. Adlakha, "Comparative Study of Different Image Captioning Models," 2021 5th International Conference on Computing Methodologies and Communication (ICCMC), Erode, India, 2021, pp. 1366-1371, doi: 10.1109/ICCMC51019.2021.9418451.
Shrimal, Anubhav and Tanmoy Chakraborty. “Attention Beam: An Image Captioning Approach.” ArXiv abs/2011.01753 (2020): n. Pag.
Kishore papineni, Salim Roukos, Todd Ward, Wei-Jing Zhu.: BLEU: a Method for Automatic Evaluation of Machine Translation Kishore, Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
T. Jaknamon and S. Marukatat, "ThaiTC:Thai Transformer-based Image Captioning," 2022 17th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP), Chiang Mai, Thailand, 2022, pp. 1-4, doi: 10.1109/iSAI-NLP56921.2022.9960246.
Y. Yang, "Image-Caption Pair Replacement Algorithm towards Semi-supervised Novel Object Captioning," 2022 7th International Conference on Intelligent Computing and Signal Processing (ICSP), Xi'an, China, 2022, pp. 266-273, doi: 10.1109/ICSP54964.2022.9778729.
C. Liu, R. Zhao, H. Chen, Z. Zou and Z. Shi, "Remote Sensing Image Change Captioning With Dual-Branch Transformers: A New Method and a Large Scale Dataset," in IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1-20, 2022, Art no. 5633520, doi: 10.1109/TGRS.2022.3218921.
G. Hoxha, F. Melgani and J. Slaghenauffi, "A New CNN-RNN Framework For Remote Sensing Image Captioning," 2020 Mediterranean and Middle-East Geoscience and Remote Sensing Symposium (M2GARSS), Tunis, Tunisia, 2020, pp. 1-4, doi: 10.1109/M2GARSS47143.2020.9105191.
J. Vaishnavi and V. Narmatha, "Video Captioning based on Image Captioning as Subsidiary Content," 2022 Second International Conference on Advances in Electrical, Computing, Communication and Sustainable Technologies (ICAECT), Bhilai, India, 2022, pp. 1-6, doi: 10.1109/ICAECT54875.2022.9807935.
Y. Feng, K. Maeda, T. Ogawa and M. Haseyama, "Human-Centric Image Retrieval with Gaze-Based Image Captioning," 2022 IEEE International Conference on Image Processing (ICIP), Bordeaux, France, 2022, pp. 3828-3832, doi: 10.1109/ICIP46576.2022.9897949.
C. Cai, K. -H. Yap and S. Wang, "Attribute Conditioned Fashion Image Captioning," 2022 IEEE International Conference on Image Processing (ICIP), Bordeaux, France, 2022, pp. 1921-1925, doi: 10.1109/ICIP46576.2022.9897417.
G. Sumbul, S. Nayak and B. Demir, "SD-RSIC: Summarization-Driven Deep Remote Sensing Image Captioning," in IEEE Transactions on Geoscience and Remote Sensing, vol. 59, no. 8, pp. 6922-6934, Aug. 2021, doi: 10.1109/TGRS.2020.3031111.
X. Ye et al., "A Joint-Training Two-Stage Method For Remote Sensing Image Captioning," in IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1-16, 2022, Art no. 4709616, doi: 10.1109/TGRS.2022.3224244.
J. Wang, Z. Chen, A. Ma and Y. Zhong, "Capformer: Pure Transformer for Remote Sensing Image Caption," IGARSS 2022 - 2022 IEEE International Geoscience and Remote Sensing Symposium, Kuala Lumpur, Malaysia, 2022, pp. 7996-7999, doi: 10.1109/IGARSS46834.2022.9883199.
J. -H. Huang, T. -W. Wu, C. -H. H. Yang and M. Worring, "Deep Context-Encoding Network For Retinal Image Captioning," 2021 IEEE International Conference on Image Processing (ICIP), Anchorage, AK, USA, 2021, pp. 3762-3766, doi: 10.1109/ICIP42928.2021.9506803.
D. Beddiar, M. Oussalah and S. Tapio, "Explainability for Medical Image Captioning," 2022 Eleventh International Conference on Image Processing Theory, Tools, and Applications (IPTA), Salzburg, Austria, 2022, pp. 1-6, doi: 10.1109/IPTA54936.2022.9784146.
N. Yu, X. Hu, B. Song, J. Yang and J. Zhang, "Topic-Oriented Image Captioning Based on Order-Embedding," in IEEE Transactions on Image Processing, vol. 28, no. 6, pp. 2743-2754, June 2019, doi: 10.1109/TIP.2018.2889922.
X. Yang, Y. Wang, H. Chen and J. Li, "CSTNET: Enhancing Global-To-Local Interactions for Image Captioning," 2022 IEEE International Conference on Image Processing (ICIP), Bordeaux, France, 2022, pp. 1861-1865, doi: 10.1109/ICIP46576.2022.9897810.
Pawar, R., Ghumbre, S., Deshmukh, R. (2018). Developing an Improvised E-Menu Recommendation System for Customer. In: Sa, P., Bakshi, S., Hatzilygeroudis, I., Sahoo, M. (eds) Recent Findings in Intelligent Computing Techniques . Advances in Intelligent Systems and Computing, vol 708. Springer, Singapore. https://doi.org/10.1007/978-981-10-8636-6_35
R. S. Pawar, S. Nema, D. R. Jawale, K. Joshi, S. Debnath and S. P. Singh, "The Role of Innovative Data Mining Approaches for Analyzing and Estimating the Crop Yield in Agriculture Among Emerging Nations," 2022 2nd International Conference on Advance Computing and Innovative Technologies in Engineering (ICACITE), Greater Noida, India, 2022, pp. 23392342,doi:10.1109/ICACITE53722.2022.9823729..
Beldar, Kavita K., M. D. Gayakwad, and M. K. Beldar. 2016. “Optimizing Analytical Queries on Probabilistic Databases with Unmerged Duplicates Using MapReduce.” Int. J. Innov. Res. Comput. Commun. Eng 4: 9651–59.
Pawar, R., Ghumbre, S., & Deshmukh, R. (2019). Visual Similarity Using Convolution Neural Network over Textual Similarity in Content-Based Recommender System. International Journal of Advanced Science and Technology, 27, 137 - 147.
Beldar, Kavita K., M. D. Gayakwad, Debnath Bhattacharyya, and Tai-Hoon Kim. 2016b. “A Comparative Analysis on Contingence Structured Data Methodologies.” International Journal of Software Engineering and Its Applications 10 (5): 13–22.
S Ranjith, Shreyas, K Pradeep Kumar, R Karthik, “Automatic Border Alert System for Fishermen using GPS and GSM techniques”, Indonesian Journal of Electrical Engineering and Computer Science , Vol 7, No.1, (2017).
Beldar, Miss Menka K., M. D. Gayakwad, and Miss Kavita K. Beldar. 2018. “Altruistic Content Voting System Using Crowdsourcing.” International Journal of Scientific Research and Review 7 (5): 477–86.
M. S. M, S. Das, S. Heble, U. Raj, and R. Karthik, “Internet of Things based Wireless Plant Sensor for Smart Farming,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 10, no. 2, p. 456, May 2018
Beldar, Miss Menka K., M. D. Gayakwad, Miss Kavita K. Beldar, and M. K. Beldar. 2018. “Survey on Classification of Online Reviews Based on Social Networking.” IJFRCSCE 4 (3): 55.
Boukhari, Mahamat Adam, Prof Milnid Gayakwad, and Prof Dr Suhas Patil. 2019. “Survey on Inappropriate Content Detection in Online Social Media.” International Journal of Innovative Research in Science, Engineering and Technology 8 (9): 9297–9302.
Gayakwad, M. D., and B. D. Phulpagar. 2013. “Research Article Review on Various Searching Methodologies and Comparative Analysis for Re-Ranking the Searched Results.” International Journal of Recent Scientific Research 4: 1817–20.
Gayakwad, Milind. 2011. “VLAN Implementation Using Ip over ATM.” Journal of Engineering Research and Studies 2 (4): 186–92.
Gayakwad, Milind, and Suhas Patil. 2020. “Content Modelling for Unbiased Information Analysis.” Libr. Philos. Pract, 1–17.
A. K. Boyat and B. K. Joshi, “A Review Paper: Noise Models in Digital Image Processing,” arXiv:1505.03489 [cs], May 2015.
Omarov, Batyrkhan Sultanovich et.al, "Exploring Image Processing and Image Restoration Techniques," International Journal of Fuzzy Logic and Intelligent Systems, vol. 15, no. 3, pp. 172-179, June 2015.
Gayakwad, Milind, Suhas Patil, Rahul Joshi, Sudhanshu Gonge, and Sandeep Dwarkanath Pande. “Credibility Evaluation of User-Generated Content Using Novel Multinomial Classification Technique.” International Journal on Recent and Innovation Trends in Computing and Communication 10 (2s): 151–57.
Rajendra Pawar et.al,“ Farmer Buddy-Plant Leaf Disease Detection on Android Phone” In International Journal of Research and Analytical Reviews. Vol 6 (2), 874-879
Gayakwad, Milind, Suhas Patil, Amol Kadam, Shashank Joshi, Ketan Kotecha, Rahul Joshi, Sharnil Pandya, et al. 2022. “Credibility Analysis of User-Designed Content Using Machine Learning Techniques.” Applied System Innovation 5 (2): 43.
Harane, Swati T., Gajanan Bhole, and Milind Gayakwad. 2017. “SECURE SEARCH OVER ENCRYPTED DATA TECHNIQUES: SURVEY.” International Journal of Advanced Research in Computer Science 8 (7).
Kavita Shevale, Gajanan Bhole, Milind Gayakwad. 2017. “Literature Review on Probabilistic Threshold Query on Uncertain Data.” International Journal of Current Research and Review 9 (6): 52482–84
Mahamat Adam Boukhari, Milind Gayakwad. 2019. “An Experimental Technique on Fake News Detection in Online Social Media.” International Journal of Innovative Technology and Exploring Engineering (IJITEE) 8 (8S3): 526–30.
Maurya, Maruti, and Milind Gayakwad. 2020. “People, Technologies, and Organizations Interactions in a Social Commerce Era.” In Proceeding of the International Conference on Computer Networks, Big Data and IoT (ICCBI-2018), 836–49. Springer International Publishing.
Milind Gayakwad, B. D. Phulpagar. 2013. “Requirement Specific Search.” IJARCSSE 3 (11): 121.
Panicker, Aishwarya, Milind Gayakwad, Sandeep Vanjale, Pramod Jadhav, Prakash Devale, and Suhas Patil. n.d. “Fake News Detection Using Machine Learning Framework.”
Andrew Hernandez, Stephen Wright, Yosef Ben-David, Rodrigo Costa, David Botha. Risk Assessment and Management with Machine Learning in Decision Science. Kuwait Journal of Machine Learning, 2(3). Retrieved from http://kuwaitjournals.com/index.php/kjml/article/view/196
Talukdar, V., Dhabliya, D., Kumar, B., Talukdar, S. B., Ahamad, S., & Gupta, A. (2022). Suspicious activity detection and classification in IoT environment using machine learning approach. Paper presented at the PDGC 2022 - 2022 7th International Conference on Parallel, Distributed and Grid Computing, 531-535. doi:10.1109/PDGC56933.2022.10053312 Retrieved from www.scopus.com
Andrew Hernandez, Stephen Wright, Yosef Ben-David, Rodrigo Costa, David Botha. Intelligent Decision Making: Applications of Machine Learning in Decision Science. Kuwait Journal of Machine Learning, 2(3). Retrieved from http://kuwaitjournals.com/index.php/kjml/article/view/197
Downloads
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.