Deep Learning in Image Processing: Transforming Computer Vision

Authors

  • Sumithra M D, M. Abdul Rahiman

Keywords:

Deep Learning, Image Processing, Vision Transformer, Computer Vision, Model Optimization

Abstract

Deep learning has been acclaimed to be the new sensation in image processing, improving the functionality of the computer vision system. The present work aims at testing the performance of deep models through image classification using Convolutional Neural Network (CNNs) and Transformer models. The experiments were conducted on the basis of versatile data set and the use of contrast photon restoration and adaptive histogram enhancement enhanced the result by 4-6 percent. The applied models were ResNet-50, EfficientNet-B0, and Vision Transformer(ViT) for which hyperparameter tuning and GPU-accelerated environments were applied for training. ViT attained the maximum classification accuracy of 92.5% hence out-performing the CNN based models. The statistical test also proved that there was significant difference (p < 0.05) in the performance. Maps of features and shaped specifications showed good hierarchical feature mapping and an AUC greater than 0.97 for ViT. Consequently, it can be seen that transformer-based models have significant benefits when applied to images. This work is beneficial to the enhancement of deep learning due to the fact that general model performance is investigated in relation to the influence of architectures as well as preprocessing methodologies.

Downloads

Download data is not yet available.

References

Litjens, G., Sanchez, C. I., Timofeeva, N., Hermsen, M., Nagtegaal, I., Kovac, I., ... & van Ginneken, B. (2017). Deep learning as a tool for increased accuracy and efficiency of histopathological diagnosis. Scientific Reports, 7(1), 1–12. doi: 10.1038/srep15951

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet Classification with Deep Convolutional Neural Networks. Advances in Neural Information Processing Systems, 25, 1097–1105.

He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep Residual Learning for Image Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770–778. doi: 10.1109/CVPR.2016.90

Long, J., Shelhamer, E., & Darrell, T. (2015). Fully Convolutional Networks for Semantic Segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3431–3440. doi: 10.1109/CVPR.2015.7298965

Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. Advances in Neural Information Processing Systems, 28, 91–99.

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., ... & Houlsby, N. (2020). An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. Proceedings of the International Conference on Learning Representations.

Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You Only Look Once (YOLO): Real-Time Object Detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 779–788. doi: 10.1109/CVPR.2016.91

Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2018). DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4), 834–848. doi: 10.1109/TPAMI.2017.2699184

Zhang, Z., Liu, Q., & Wang, Y. (2018). Road Extraction by Deep Residual U-Net. IEEE Geoscience and Remote Sensing Letters, 15(5), 749–753. doi: 10.1109/LGRS.2018.2815518

Simonyan, K., & Zisserman, A. (2015). Very Deep Convolutional Networks for Large-Scale Image Recognition. Proceedings of the International Conference on Learning Representations.

Raschka, S., Patterson, J., & Nolet, C. (2023). Machine Learning in Python. Machine Learning Mastery.

Shin, H. C., Roth, H. R., Gao, M., Lu, L., Xu, Z., Nogues, I., ... & Summers, R. M. (2016). Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning. IEEE Transactions on Medical Imaging, 35(5), 1285–1298. doi: 10.1109/TMI.2016.2528162

Guo, Y., Liu, Y., Oerlemans, A., Lao, S., Wu, S., & Lew, M. S. (2016). Deep Learning for Visual Understanding: A Review. Neurocomputing, 187, 27–48. doi: 10.1016/j.neucom.2015.09.116

Liu, W., Zhang, D., & Li, F. (2018). Deep Learning for Image Segmentation: A Survey. IEEE Transactions on Neural Networks and Learning Systems, 29(4), 1043–1056. doi: 10.1109/TNNLS.2017.2732483

Kazeminia, S., Baur, C., Kuijper, A., van Ginneken, B., Navab, N., Albarqouni, S., & Mukhopadhyay, A. (2020). GANs for Medical Imaging: A Review. Medical Image Analysis, 65, 101832. doi: 10.1016/j.media.2020.101832

Fu, H., Xu, Y., Lin, S., & Zhang, D. (2019). Deep Learning for Medical Image Segmentation: A Survey. IEEE Transactions on Medical Imaging, 38(10), 2313–2323. doi: 10.1109/TMI.2019.2917002

Badža, M., & Barjaktarović, M. (2020). Simple CNN Model for Brain Tumor Classification Using MRI Images. Computers in Biology and Medicine, 121, 103794. doi: 10.1016/j.compbiomed.2020.103794

Rachapudi, V. N., & Lavanya, S. (2020). Efficient CNN Architecture for Colorectal Cancer Histopathological Image Classification. IEEE Journal of Biomedical and Health Informatics, 24(5), 1231–1238. doi: 10.1109/JBHI.2020.2969119

Sun, B., Li, Y., & Zhang, Y. (2020). 3D FCNN-Based Model for Multimodal Brain Tumor Image Segmentation. IEEE Transactions on Neural Networks and Learning Systems, 31(1), 201–212. doi: 10.1109/TNNLS.2019.2912938

Özcan, A., Ünver, M., & Ergüzen, A. (2022). Deep Learning Applications and Image Processing in Computer Vision Systems Development and Research Advances in AI Models.

LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep Learning. Nature, 521(7553), 436–444. doi: 10.1038/nature14539

Hinton, G. E., Osindero, S., & Teh, Y. W. (2006). A Fast Learning Algorithm for Deep Belief Nets. Neural Computation, 18(7), 1527–1554. doi: 10.1162/neco.2006.18.7.1527

Vincent, P., Larochelle, H., Bengio, Y., & Manzagol, P. A. (2008). Extracting and Composing Robust Features with Denoising Autoencoders. Proceedings of the 25th International Conference on Machine Learning, 1096–1103.

Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 580–587. doi: 10.1109/CVPR.2014.81

Liu, L., Ouyang, W., Wang, X., Fieguth, P., Chen, J., Liu, X., & Pietikäinen, M. (2019). Deep Learning for Generic Object Detection: A Survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(9), 2132–2147. doi: 10.1109/TPAMI.2018.2858826

Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2018). DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4), 834–848. doi: 10.1109/TPAMI.2017.2699184

Zhang, Z., Liu, Q., & Wang, Y. (2018). Road Extraction by Deep Residual U-Net. IEEE Geoscience and Remote Sensing Letters, 15(5), 749–753. doi: 10.1109/LGRS.2018.2815518

Simonyan, K., & Zisserman, A. (2015). Very Deep Convolutional Networks for Large-Scale Image Recognition. Proceedings of the International Conference on Learning Representations.

Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., ... & Rabinovich, A. (2015). Going Deeper with Convolutions. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1–9. doi: 10.1109/CVPR.2015.7298594

Ioffe, S., & Szegedy, C. (2015). Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. Proceedings of the 32nd International Conference on Machine Learning, 448–456.

Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You Only Look Once (YOLO): Real-Time Object Detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 779–788. doi: 10.1109/CVPR.2016.91

Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., & Fu, C. Y. (2016). SSD: Single Shot MultiBox Detector. Proceedings of the European Conference on Computer Vision, 21–37. doi: 10.1007/978-3-319-46448-0_2

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., ... & Houlsby, N. (2020). An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. Proceedings of the International Conference on Learning Representations.

Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., & Zagoruyko, S. (2020). End-to-End Object Detection with Transformers. Proceedings of the European Conference on Computer Vision, 213–229. doi: 10.1007/978-3-030-58580-2_13

Chen, Y., Wang, S., Lin, L., Cui, Z., & Zong, Y. (2024). Computer Vision and Deep Learning Transforming Image Processing Technology. International Journal of Computer Science and Information Technology, 2(1), 45–51.

Ronneberger, O., Fischer, P., & Brox, T. (2015). U-Net: Convolutional Networks for Biomedical Image Segmentation. Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, 234–241. doi: 10.1007/978-3-319-24574-4_28

Mnih, V., Heess, N., Graves, A., & Kavukcuoglu, K. (2014). Recurrent Models of Visual Attention. Advances in Neural Information Processing Systems, 27, 2204–2212.

Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhutdinov, R., ... & Bengio, Y. (2015). Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. Proceedings of the 32nd International Conference on Machine Learning, 2048–2057.

Cheng, G., Zhou, P., & Han, J. (2016). Learning Rotation-Invariant Convolutional Neural Networks for Object Detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3785–3794. doi: 10.1109/CVPR.2016.414

Zhang, Y., Chen, K., & Grauman, K. (2018). Visual Search at Pinterest. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5082–5091. doi: 10.1109/CVPR.2018.00534

Downloads

Published

26.01.2021

How to Cite

Sumithra M D. (2021). Deep Learning in Image Processing: Transforming Computer Vision. International Journal of Intelligent Systems and Applications in Engineering, 9(1), 72–84. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/7349

Issue

Section

Research Article