Deep Learning in Image Processing: Transforming Computer Vision
Keywords:
Deep Learning, Image Processing, Vision Transformer, Computer Vision, Model OptimizationAbstract
Deep learning has been acclaimed to be the new sensation in image processing, improving the functionality of the computer vision system. The present work aims at testing the performance of deep models through image classification using Convolutional Neural Network (CNNs) and Transformer models. The experiments were conducted on the basis of versatile data set and the use of contrast photon restoration and adaptive histogram enhancement enhanced the result by 4-6 percent. The applied models were ResNet-50, EfficientNet-B0, and Vision Transformer(ViT) for which hyperparameter tuning and GPU-accelerated environments were applied for training. ViT attained the maximum classification accuracy of 92.5% hence out-performing the CNN based models. The statistical test also proved that there was significant difference (p < 0.05) in the performance. Maps of features and shaped specifications showed good hierarchical feature mapping and an AUC greater than 0.97 for ViT. Consequently, it can be seen that transformer-based models have significant benefits when applied to images. This work is beneficial to the enhancement of deep learning due to the fact that general model performance is investigated in relation to the influence of architectures as well as preprocessing methodologies.
Downloads
References
Litjens, G., Sanchez, C. I., Timofeeva, N., Hermsen, M., Nagtegaal, I., Kovac, I., ... & van Ginneken, B. (2017). Deep learning as a tool for increased accuracy and efficiency of histopathological diagnosis. Scientific Reports, 7(1), 1–12. doi: 10.1038/srep15951
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet Classification with Deep Convolutional Neural Networks. Advances in Neural Information Processing Systems, 25, 1097–1105.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep Residual Learning for Image Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770–778. doi: 10.1109/CVPR.2016.90
Long, J., Shelhamer, E., & Darrell, T. (2015). Fully Convolutional Networks for Semantic Segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3431–3440. doi: 10.1109/CVPR.2015.7298965
Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. Advances in Neural Information Processing Systems, 28, 91–99.
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., ... & Houlsby, N. (2020). An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. Proceedings of the International Conference on Learning Representations.
Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You Only Look Once (YOLO): Real-Time Object Detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 779–788. doi: 10.1109/CVPR.2016.91
Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2018). DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4), 834–848. doi: 10.1109/TPAMI.2017.2699184
Zhang, Z., Liu, Q., & Wang, Y. (2018). Road Extraction by Deep Residual U-Net. IEEE Geoscience and Remote Sensing Letters, 15(5), 749–753. doi: 10.1109/LGRS.2018.2815518
Simonyan, K., & Zisserman, A. (2015). Very Deep Convolutional Networks for Large-Scale Image Recognition. Proceedings of the International Conference on Learning Representations.
Raschka, S., Patterson, J., & Nolet, C. (2023). Machine Learning in Python. Machine Learning Mastery.
Shin, H. C., Roth, H. R., Gao, M., Lu, L., Xu, Z., Nogues, I., ... & Summers, R. M. (2016). Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning. IEEE Transactions on Medical Imaging, 35(5), 1285–1298. doi: 10.1109/TMI.2016.2528162
Guo, Y., Liu, Y., Oerlemans, A., Lao, S., Wu, S., & Lew, M. S. (2016). Deep Learning for Visual Understanding: A Review. Neurocomputing, 187, 27–48. doi: 10.1016/j.neucom.2015.09.116
Liu, W., Zhang, D., & Li, F. (2018). Deep Learning for Image Segmentation: A Survey. IEEE Transactions on Neural Networks and Learning Systems, 29(4), 1043–1056. doi: 10.1109/TNNLS.2017.2732483
Kazeminia, S., Baur, C., Kuijper, A., van Ginneken, B., Navab, N., Albarqouni, S., & Mukhopadhyay, A. (2020). GANs for Medical Imaging: A Review. Medical Image Analysis, 65, 101832. doi: 10.1016/j.media.2020.101832
Fu, H., Xu, Y., Lin, S., & Zhang, D. (2019). Deep Learning for Medical Image Segmentation: A Survey. IEEE Transactions on Medical Imaging, 38(10), 2313–2323. doi: 10.1109/TMI.2019.2917002
Badža, M., & Barjaktarović, M. (2020). Simple CNN Model for Brain Tumor Classification Using MRI Images. Computers in Biology and Medicine, 121, 103794. doi: 10.1016/j.compbiomed.2020.103794
Rachapudi, V. N., & Lavanya, S. (2020). Efficient CNN Architecture for Colorectal Cancer Histopathological Image Classification. IEEE Journal of Biomedical and Health Informatics, 24(5), 1231–1238. doi: 10.1109/JBHI.2020.2969119
Sun, B., Li, Y., & Zhang, Y. (2020). 3D FCNN-Based Model for Multimodal Brain Tumor Image Segmentation. IEEE Transactions on Neural Networks and Learning Systems, 31(1), 201–212. doi: 10.1109/TNNLS.2019.2912938
Özcan, A., Ünver, M., & Ergüzen, A. (2022). Deep Learning Applications and Image Processing in Computer Vision Systems Development and Research Advances in AI Models.
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep Learning. Nature, 521(7553), 436–444. doi: 10.1038/nature14539
Hinton, G. E., Osindero, S., & Teh, Y. W. (2006). A Fast Learning Algorithm for Deep Belief Nets. Neural Computation, 18(7), 1527–1554. doi: 10.1162/neco.2006.18.7.1527
Vincent, P., Larochelle, H., Bengio, Y., & Manzagol, P. A. (2008). Extracting and Composing Robust Features with Denoising Autoencoders. Proceedings of the 25th International Conference on Machine Learning, 1096–1103.
Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 580–587. doi: 10.1109/CVPR.2014.81
Liu, L., Ouyang, W., Wang, X., Fieguth, P., Chen, J., Liu, X., & Pietikäinen, M. (2019). Deep Learning for Generic Object Detection: A Survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(9), 2132–2147. doi: 10.1109/TPAMI.2018.2858826
Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2018). DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4), 834–848. doi: 10.1109/TPAMI.2017.2699184
Zhang, Z., Liu, Q., & Wang, Y. (2018). Road Extraction by Deep Residual U-Net. IEEE Geoscience and Remote Sensing Letters, 15(5), 749–753. doi: 10.1109/LGRS.2018.2815518
Simonyan, K., & Zisserman, A. (2015). Very Deep Convolutional Networks for Large-Scale Image Recognition. Proceedings of the International Conference on Learning Representations.
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., ... & Rabinovich, A. (2015). Going Deeper with Convolutions. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1–9. doi: 10.1109/CVPR.2015.7298594
Ioffe, S., & Szegedy, C. (2015). Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. Proceedings of the 32nd International Conference on Machine Learning, 448–456.
Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You Only Look Once (YOLO): Real-Time Object Detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 779–788. doi: 10.1109/CVPR.2016.91
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., & Fu, C. Y. (2016). SSD: Single Shot MultiBox Detector. Proceedings of the European Conference on Computer Vision, 21–37. doi: 10.1007/978-3-319-46448-0_2
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., ... & Houlsby, N. (2020). An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. Proceedings of the International Conference on Learning Representations.
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., & Zagoruyko, S. (2020). End-to-End Object Detection with Transformers. Proceedings of the European Conference on Computer Vision, 213–229. doi: 10.1007/978-3-030-58580-2_13
Chen, Y., Wang, S., Lin, L., Cui, Z., & Zong, Y. (2024). Computer Vision and Deep Learning Transforming Image Processing Technology. International Journal of Computer Science and Information Technology, 2(1), 45–51.
Ronneberger, O., Fischer, P., & Brox, T. (2015). U-Net: Convolutional Networks for Biomedical Image Segmentation. Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, 234–241. doi: 10.1007/978-3-319-24574-4_28
Mnih, V., Heess, N., Graves, A., & Kavukcuoglu, K. (2014). Recurrent Models of Visual Attention. Advances in Neural Information Processing Systems, 27, 2204–2212.
Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhutdinov, R., ... & Bengio, Y. (2015). Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. Proceedings of the 32nd International Conference on Machine Learning, 2048–2057.
Cheng, G., Zhou, P., & Han, J. (2016). Learning Rotation-Invariant Convolutional Neural Networks for Object Detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3785–3794. doi: 10.1109/CVPR.2016.414
Zhang, Y., Chen, K., & Grauman, K. (2018). Visual Search at Pinterest. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5082–5091. doi: 10.1109/CVPR.2018.00534
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.