Exploring Challenges and Advances in Human Pose Estimation: An Investigation into Deep Learning Research and Artificial Intelligence

Authors

  • Sanjeev Kulkarni Dept. of Computer Science & Engineering, Institute of Engineering & technology, Srinivas University, Mukka, Mangalore
  • Aishwarya Shetty Dept. of Computer Science & Engineering, Nitte (Deemed to University) NMAM Institute of Technology, Nitte Karnataka, India.574110
  • Soumya Ashwath Dept. of Computer Science & Engineering, Nitte (Deemed to University) NMAM Institute of Technology, Nitte Karnataka, India.574110
  • Ranjit Kolkar Dept. of Computer Science & Engineering, Institute of Engineering & technology, Srinivas University, Mukka, Mangalore, Karnataka, India 574146
  • Preethi Salian Dept. of Information Science & Engineering, Nitte (Deemed to University) NMAM Institute of Technology, Nitte Karnataka, India.574110
  • Vishalakshi H. Department of Computer Science, KSRDPRU, University, Gadag. Karnataka, India, 582101

Keywords:

human pose, pose estimation, single person pose, deep learning, pose detection and multi person pose

Abstract

Human Pose Estimation (HPE) refers to a methodology employed to detect and localise key anatomical features on the human body, such as the body skeleton, inside photographs and videos. Over the last few decades, it has garnered a lot of interest, and it has been utilised in a wide variety of applications, including human-computer interface, animation, motion analysis, augmented reality, and virtual reality. Estimating human poses may be broken down into several categories, including estimating human poses for a single person, estimating human poses for several people, estimating human poses in movies, and estimating human poses in busy areas. The output of posture estimate can either be in a 2D or 3D coordinate format, depending on the application that it is being used for. When estimating a posture in three dimensions in two dimensions, joint angles are what are employed. Judging position is made more difficult by factors such as joints that are small and hardly visible, forceful articulations, occlusions, clothing, and changes in illumination. In order to address the problems, deep learning-based CNN models have made substantial headway in the field of human posture estimation. The goal of this survey research is to provide a methodical analysis and comparison of existing deep learning-based solutions for both 2D and 3D pose estimation based on their input data. In this study, we conducted a literature review of more than 50 other studies that were relevant to various posture estimation models for single person and multi-person pose estimation.

Downloads

Download data is not yet available.

References

T. B. Moeslund and E. Granum, “A survey of computer vision based human motion capture,” CVIU, pp. 1-10, 2001.

M. Andriluka, L. Pishchulin, P. V. Gehler, and B. Schiele, "2D Human Pose Estimation: New Benchmark and State of the Art Analysis," IEEE Conference on Computer Vision and Pattern Recognition, pp. 3686-3693, 2014.

D. Mwiti, "A 2019 Guide to Human Pose Estimation," 2019. [Online]. Available: https://heartbeat.fritz.ai/a-2019-guide-to-human-pose-estimation-c10b79b64b73

R. Poppe, “Vision-based human motion analysis: An overview,” CVIU, pp. 1-8, 2007.

M. B. Holte, C. Tran, M. M. Trivedi, and T. B. Moeslund, “Human pose estimation and activity recognition from multi-view videos: Comparative explorations of recent developments,” IEEE Journal of Selected Topics in Signal Processing, 2012.

Li, C., Lee, G.H., Generating multiple hypotheses for 3d human pose estimation with mixture density network. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 9887–9895, 2019.

Belagiannis, V., Zisserman, A., Recurrent human pose estimation. In: Proc. IEEE Conference on Automatic Face and Gesture Recognition. IEEE, pp. 468–475, 2017.

M. B. Holte, C. Tran, M. M. Trivedi, and T. B. Moeslund, “Human pose estimation and activity recognition from multi-view videos: Comparative explorations of recent developments,” IEEE Journal of Selected Topics in Signal Processing, 2012.

Johnson, S., Everingham, M., Clustered pose and nonlinear appearance models for human pose estimation. In: Proc. British Machine Vision Conference, p. 1-5, 2010.

Carreira, J., Agrawal, P., Fragkiadaki, K., Malik, J., Human pose estimation with iterative error feedback. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 4733–4742, 2016.

Charles, J., Pfister, T., Everingham, M., Zisserman, A., Automatic and efficient human pose estimation for sign language videos. Int. J. Comput. Vis. Vol. 1(10), 70–90, 2014.

Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B., 2d human pose estimation: New benchmark and state of the art analysis. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 3686–3693, 2014.

Martinez, J., Hossain, R., Romero, J., Little, J.J., A simple yet effective baseline for 3d human pose estimation. In: Proc. IEEE International Conference on Computer Vision, pp. 2640–2649, 2017.

Gong, W., Zhang, X., Gonzàlez, A., Bouwmans, T., Tu, C., Zahzah, E.h., Human pose estimation from monocular images: A comprehensive survey. Sensors Vol. 16, pp. 19-26, 2016.

Y. Chen, Y. Tian, and M. He, “Monocular human pose estimation: A survey of deep learning-based methods,” CVIU, 2020.

A. Toshev and C. Szegedy, “Deeppose: Human pose estimation via deep neural networks,” in CVPR, 2014.

Lifshitz, I., Fetaya, E., Ullman, S., Human pose estimation using deep consensus voting. In: Proc. European Conference on Computer Vision. Springer, pp. 246–260, 2016.

Z. Liu, J. Zhu, J. Bu, and C. Chen, “A survey of human pose estimation: the body parts parsing based methods,” JVCIR, 2015.

Lepetit V., Fua P. Monocular Model-Based 3D Tracking of Rigid Objects: A Survey. Found. Trends Comput. Graph. Vis., Vol. 1:pp. 1–89, 2005.

Perez-Sala, X., Escalera, S., Angulo, C., Gonzalez, J., A survey on model based approaches for 2d and 3d visual human pose recovery. Sensors Vol. 1(4), pp. 4189–4210, 2014.

Chen, L., Wei, H., Ferryman, J., A survey of human motion analysis using depth imagery. Pattern Recognit. Lett. Vol. 3(4), pp. 1995–2006, 2006.

Moeslund, T.B., Hilton, A., Krüger, V., A survey of advances in vision-based human motion capture and analysis. Comput. Vis. Image Underst. Vol. 1(4), pp. 90–126, 2006.

Aggarwal, J.K., Cai, Q., Human motion analysis: A review. Comput. Vis. Image Underst. Vol. 73, pp. 428–440, 1999.

Gavrila, D.M., The visual analysis of human movement: A survey. Comput. Vis. Image Underst. Vol. 7(3), pp. 82–98, 1999.

Poppe, R., Vision-based human motion analysis: An overview. Comput. Vis. Image Underst. Vol. 1(8), pp. 4–18, 2007.

Ji, X., Liu, H., Advances in view-invariant human motion analysis: A review. IEEE Trans. Syst. Man Cybern, Vol. 40, pp. 13–24, 2000.

Moeslund, T.B., Hilton, A., Krüger, L., Visual Analysis of Humans. Springer, Vol. 3(2), pp. 1-10, 2011.

Hu, W., Tan, T., Wang, L., Maybank, S., A survey on visual surveillance of object motion and behaviors. IEEE Trans. Syst. Man Cybern. Vol. 3(4), pp. 334–352, 2004.

Wang, P., Li, W., Ogunbona, P., Wan, J., Escalera, S., Rgb-d-based human motion recognition with deep learning: A survey. Comput. Vis. Image Underst. Vol. 1(7), pp. 118–139, 2018.

Sminchisescu, C., 2008. 3d human motion analysis in monocular video: techniques and challenges. In: Human Motion. Springer, pp. 185–211.

Sarafianos, N., Boteanu, B., Ionescu, B., Kakadiaris, I.A., 3d human pose estimation: A review of the literature and analysis of covariates. Comput. Vis. Image Underst. Vol. 1(5), pp. 1–20, 2016.

Mehta, D., Sotnychenko, O., Mueller, F., Xu, W., Sridhar, S., Pons-Moll, G., Theobalt, C., Single-shot multi-person 3d body pose estimation from monocular rgb input. In: International Conference on 3D Vision, pp. 120-130, 2018.

Holte, M.B., Tran, C., Trivedi, M.M., Moeslund, T.B., 2012. Human pose estimation and activity recognition from multi-view videos: Comparative explorations of recent developments. IEEE J. Sel. Top. Signal Process. 6, 538–552.

Eichner, M., Ferrari, V., 2012b. Human pose co-estimation and applications. IEEE Trans. Pattern Anal. Mach. Intell. 34, 2282–2288.

Ren, S., He, K., Girshick, R., Sun, J., Faster r-cnn: Towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems. pp. 91–99, 2015.

Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B., 2d human pose estimation: New benchmark and state of the art analysis. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 3686–3693, 2014.

Jain, A., Tompson, J., Andriluka, M., Taylor, G.W., Bregler, C., Learning human pose estimation features with convolutional networks. arXiv preprint arXiv:1312.7302, 2013.

Cao, Z., Simon, T., Wei, S.E., Sheikh, Y., 2017. Realtime multi-person 2d pose estimation using part affinity fields. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 7291-7299.

T. L. Munea, Y. Z. Jembre, H. T. Weldegebriel, L. Chen, C. Huang, and C. Yang, “The progress of human pose estimation: A survey and taxonomy of models applied in 2d human pose estimation,” IEEE Access, 2020.

Chen, C.H., Ramanan, D., 2017. 3d human pose estimation 2d pose estimation matching. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 7035–7043.

Mehta, D., Sotnychenko, O., Mueller, F., Xu, W., Elgharib, M., Fua, P., Seidel, H.P., Rhodin, H., Pons-Moll, G., Theobalt, C., 2019. Xnect: Real-time multi-person 3d human pose estimation with a single rgb camer. arXiv:1907.00837.

Bogo, F., Kanazawa, A., Lassner, C., Gehler, P., Romero, J., Black, M.J., 2016. Keep it smpl: Automatic estimation of 3d human pose and shape from a single image. In: Proc. European Conference on Computer Vision. Springer, pp. 561–578.

Sarafianos, N., Boteanu, B., Ionescu, B., Kakadiaris, I.A., 2016. 3d human pose estimation: A review of the literature and analysis of covariates. Comput. Vis. Image Underst. 152, 1–20.

Rogez, G., Weinzaepfel, P., Schmid, C., LCR-net: Localization-classification regression for human pose. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 3433–3441, 2017.

Zanfir, A., Marinoiu, E., Sminchisescu, C., 2018. Monocular 3d pose and shape estimation of multiple people in natural scenes-the importance of multiple scene constraints. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 2148–2157.

Popa, A.I., Zanfir, M., Sminchisescu, C., 2017. Deep multitask architecture for integrated 2d and 3d human sensing. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 4714–4723.

F. Angelini, Z. Fu, Y. Long, L. Shao, and S. M. Naqvi, “Actionxpose: A novel 2d multi-view pose-based algorithm for real-time human action recognition,” arXiv preprint arXiv:1810.12126, 2018.

S. Das, S. Sharma, R. Dai, F. Br´emond, and M. Thonnat, “VPN: Learning video-pose embedding for activities of daily living,” in ECCV, 2020.

J.Wang, K. Qiu, H. Peng, J. Fu, and J. Zhu, “Ai coach: Deep human pose estimation and analysis for personalized athletic training assistance,” in ACM MM, 2019.

C. Weng, B. Curless, and I. Kemelmacher-Shlizerman, “Photo wake-up: 3d character animation from a single photo,” in CVPR, 2019.

H. Zhang, C. Sciutto, M. Agrawala, and K. Fatahalian, “Vid2player: Controllable video sprites that behave and appear like professional tennis players,” arXiv preprint arXiv:2008.04524, 2020.

Downloads

Published

24.03.2024

How to Cite

Kulkarni, S. ., Shetty, A. ., Ashwath, S. ., Kolkar, R. ., Salian, P. ., & H., V. . (2024). Exploring Challenges and Advances in Human Pose Estimation: An Investigation into Deep Learning Research and Artificial Intelligence. International Journal of Intelligent Systems and Applications in Engineering, 12(19s), 347–354. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/5073

Issue

Section

Research Article

Most read articles by the same author(s)

Similar Articles

You may also start an advanced similarity search for this article.