Multipose Attaire-fit-in using Deep Neural Network Frame Works

Authors

  • Chandrashekhara K T, Gireesh Babu C N, Vijaykumar Gurani, Ashwini N, Bhavya G,Sumith S

Keywords:

Deep Learning, Virtual Try- on, Apparel Try-on, Post Estimation

Abstract

Creating an image-based virtual try-on system that seamlessly fits in-shop clothing onto a reference person across various poses poses a significant challenge. Previous efforts have concentrated on preserving intricate clothing details such as textures, logos, and patterns during the transfer of desired garments onto a target person in a fixed pose. However, when extending these methods to accommodate multi-pose scenarios, the performance of existing approaches has notably declined. This paper introduces an innovative end-to-end solution, the Multi-Pose Virtual Try-On Network, designed to effectively adapt desired clothing onto a reference person in arbitrary poses. The virtual try-on process involves three key sub-modules. Firstly, a Semantic Prediction Module (SPM) is employed to generate a comprehensive semantic map of the desired clothing. This predicted semantic map enhances guidance for locating the desired clothing region, leading to the creation of a preliminary try-on image. Secondly, a module is introduced to warp the clothing to the desired shape based on the predicted semantic map and pose. To address misalignment issues during the clothing warping process, a conductible cycle consistency loss is incorporated. Lastly, a Try-on Module combines the coarse result and the warped clothes to produce the final virtual try-on image, ensuring the preservation of intricate details and alignment with the desired pose.Additionally, a face identity loss is introduced to refine facial appearance while maintaining the identity of the virtual try-on result. The proposed method is evaluated on a vast multi-pose dataset, demonstrating its superiority over state-of-the-art methods. The qualitative and quantitative experiments indicate that the virtual try-on system exhibits robustness to data noise, including changes in background and accessories such as hats and handbags. This showcases its scalability and effectiveness in real-world scenarios.

Downloads

Download data is not yet available.

References

Lewis KM, Varadharajan S, Kemelmacher-Shlizerman I, Tryon-gan: body-aware try-on via layered interpolation. ACM Trans Graph.2021

M. R. Minar, T. T. Tuan, H. Ahn, P. Rosin, and Y.-K. Lai, “CP-VTON+: Clothing shape and texture preserving image-based virtual try-on,” inProc. IEEE/CVF Conf. Compute. Vis. Pattern Recognit. Workshops, Jun. 2021

Gong K, Liang X, Zhang D, Shen X, Lin L Look into person: self-supervised structure-sensitive learning and a new benchmark for human parsing. In: Proceedings of the IEEE conference on computer vision and pattern recognition,2021

Choi S, Park S, Lee M, Choo J Viton-hd: high-resolution virtual try-on via misalignment-aware normalization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition,2021.

Ghodhbani H, Neji M, Razzak I, Alimi AM .You can try without visiting: a comprehensive survey on virtually try-on outfits.,2022

B. Akinkunmi and P. C. Bassey. A Qualitative Approach for Spatial Qualification Logic. International Journal of Artificial Intelligence & Applications, 2017.

Jetchev N, Bergmann U, The conditional analogy gan: swapping fashion articles on people images. In: Proceedings of the IEEE international conference on computer vision workshops,2022.

Sarkar K, Golyanik V, Liu L, Theobalt C ,Style and pose control for image synthesis of humans from a single monocular view,2021.

T. Park, A. A. Efros, R. Zhang, and J.-Y. Zhu, “Contrastive learning for unpaired image-to-image translation,” in Proc. Eur. Conf. Comput. Vis., Cham, Switzerland, 2020..

Ge, Y., Song, Y., Zhang, R., Ge, C., Liu, W., Luo, P.: Parser-free virtual try-on via distilling appearance flows. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 8485–8493 (2021).

Han,X., Hu,X., Huang, W., Scott, M.R.: Clothflow: A flow-based model for clothed person generation. In: Proc. of the IEEE international conference on computer vision (ICCV). pp. 10471–10480 (2019).

Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local nash equilibrium. In: Proc. the Advances in Neural Information Processing Systems (NeurIPS) (2017).

Issenhuth, T., Mary, J., Calauz` enes, C.: Do not mask what you do not need to mask: a parser-free virtual try-on. In: European Conference on Computer Vision. pp. 619–635. Springer (2020).

Jandial, S., Chopra, A., Ayush, K., Hemani, M., Krishnamurthy, B., Halwai, A.: Sievenet: A unified framework for robust image-based virtual try-on. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. pp. 2182–2190 (2020).

Li, K. Chong, M.J., Zhang, J., Liu, J.: Toward accurate and realistic outfits visualization with attention to details. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 15546–15555 (2021)

Lim, J.H., Ye, J.C.: Geometric gan. arXiv preprint arXiv:1705.02894 (2017).

Mao, X., Li, Q., Xie, H., Lau, R.Y., Wang, Z., Paul Smolley, S.: Least squares generative adversarial networks. In: Proceedings of the IEEE international conference on computer vision. pp. 2794–2802 (2017).

Minar, M.R., Ahn, H.: Cloth-vton: Clothing three-dimensional reconstruction for hybrid image-based virtual try-on. In: Proceedings of the Asian Conference on Computer Vision (2020).

Miyato, T., Kataoka, T., Koyama, M., Yoshida, Y.: Spectral normalization for generative adversarial networks. In: International Conference on Learning Representations (2018).

Podell, D., English, Z., Lacey, K., Blattmann, A., Dockhorn, T., Müller, J., Penna, J., Rombach, R.: Sdxl: Improving latent diffusion models for high-resolution image synthesis. arXiv preprint arXiv:2307.01952 (2023)

Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al.: Learning transferable visual models from natural language supervision. In: International conference on machine learning. pp.

Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., Liu, P.J.: Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research 21.

Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., Chen, M.: Hierarchical text conditional image generation with clip latents. arXiv preprint arXiv:2204.06125 1(2), 3 (2022)

Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 10684–10695 (2022)

Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18. pp. 234–241, Springer (2018).

Downloads

Published

12.06.2024

How to Cite

Chandrashekhara K T. (2024). Multipose Attaire-fit-in using Deep Neural Network Frame Works. International Journal of Intelligent Systems and Applications in Engineering, 12(4), 1633–1642. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/6461

Issue

Section

Research Article