Exploring Challenges and Advances in Human Pose Estimation: An Investigation into Deep Learning Research and Artificial Intelligence
Keywords:
human pose, pose estimation, single person pose, deep learning, pose detection and multi person poseAbstract
Human Pose Estimation (HPE) refers to a methodology employed to detect and localise key anatomical features on the human body, such as the body skeleton, inside photographs and videos. Over the last few decades, it has garnered a lot of interest, and it has been utilised in a wide variety of applications, including human-computer interface, animation, motion analysis, augmented reality, and virtual reality. Estimating human poses may be broken down into several categories, including estimating human poses for a single person, estimating human poses for several people, estimating human poses in movies, and estimating human poses in busy areas. The output of posture estimate can either be in a 2D or 3D coordinate format, depending on the application that it is being used for. When estimating a posture in three dimensions in two dimensions, joint angles are what are employed. Judging position is made more difficult by factors such as joints that are small and hardly visible, forceful articulations, occlusions, clothing, and changes in illumination. In order to address the problems, deep learning-based CNN models have made substantial headway in the field of human posture estimation. The goal of this survey research is to provide a methodical analysis and comparison of existing deep learning-based solutions for both 2D and 3D pose estimation based on their input data. In this study, we conducted a literature review of more than 50 other studies that were relevant to various posture estimation models for single person and multi-person pose estimation.
Downloads
References
T. B. Moeslund and E. Granum, “A survey of computer vision based human motion capture,” CVIU, pp. 1-10, 2001.
M. Andriluka, L. Pishchulin, P. V. Gehler, and B. Schiele, "2D Human Pose Estimation: New Benchmark and State of the Art Analysis," IEEE Conference on Computer Vision and Pattern Recognition, pp. 3686-3693, 2014.
D. Mwiti, "A 2019 Guide to Human Pose Estimation," 2019. [Online]. Available: https://heartbeat.fritz.ai/a-2019-guide-to-human-pose-estimation-c10b79b64b73
R. Poppe, “Vision-based human motion analysis: An overview,” CVIU, pp. 1-8, 2007.
M. B. Holte, C. Tran, M. M. Trivedi, and T. B. Moeslund, “Human pose estimation and activity recognition from multi-view videos: Comparative explorations of recent developments,” IEEE Journal of Selected Topics in Signal Processing, 2012.
Li, C., Lee, G.H., Generating multiple hypotheses for 3d human pose estimation with mixture density network. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 9887–9895, 2019.
Belagiannis, V., Zisserman, A., Recurrent human pose estimation. In: Proc. IEEE Conference on Automatic Face and Gesture Recognition. IEEE, pp. 468–475, 2017.
M. B. Holte, C. Tran, M. M. Trivedi, and T. B. Moeslund, “Human pose estimation and activity recognition from multi-view videos: Comparative explorations of recent developments,” IEEE Journal of Selected Topics in Signal Processing, 2012.
Johnson, S., Everingham, M., Clustered pose and nonlinear appearance models for human pose estimation. In: Proc. British Machine Vision Conference, p. 1-5, 2010.
Carreira, J., Agrawal, P., Fragkiadaki, K., Malik, J., Human pose estimation with iterative error feedback. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 4733–4742, 2016.
Charles, J., Pfister, T., Everingham, M., Zisserman, A., Automatic and efficient human pose estimation for sign language videos. Int. J. Comput. Vis. Vol. 1(10), 70–90, 2014.
Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B., 2d human pose estimation: New benchmark and state of the art analysis. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 3686–3693, 2014.
Martinez, J., Hossain, R., Romero, J., Little, J.J., A simple yet effective baseline for 3d human pose estimation. In: Proc. IEEE International Conference on Computer Vision, pp. 2640–2649, 2017.
Gong, W., Zhang, X., Gonzàlez, A., Bouwmans, T., Tu, C., Zahzah, E.h., Human pose estimation from monocular images: A comprehensive survey. Sensors Vol. 16, pp. 19-26, 2016.
Y. Chen, Y. Tian, and M. He, “Monocular human pose estimation: A survey of deep learning-based methods,” CVIU, 2020.
A. Toshev and C. Szegedy, “Deeppose: Human pose estimation via deep neural networks,” in CVPR, 2014.
Lifshitz, I., Fetaya, E., Ullman, S., Human pose estimation using deep consensus voting. In: Proc. European Conference on Computer Vision. Springer, pp. 246–260, 2016.
Z. Liu, J. Zhu, J. Bu, and C. Chen, “A survey of human pose estimation: the body parts parsing based methods,” JVCIR, 2015.
Lepetit V., Fua P. Monocular Model-Based 3D Tracking of Rigid Objects: A Survey. Found. Trends Comput. Graph. Vis., Vol. 1:pp. 1–89, 2005.
Perez-Sala, X., Escalera, S., Angulo, C., Gonzalez, J., A survey on model based approaches for 2d and 3d visual human pose recovery. Sensors Vol. 1(4), pp. 4189–4210, 2014.
Chen, L., Wei, H., Ferryman, J., A survey of human motion analysis using depth imagery. Pattern Recognit. Lett. Vol. 3(4), pp. 1995–2006, 2006.
Moeslund, T.B., Hilton, A., Krüger, V., A survey of advances in vision-based human motion capture and analysis. Comput. Vis. Image Underst. Vol. 1(4), pp. 90–126, 2006.
Aggarwal, J.K., Cai, Q., Human motion analysis: A review. Comput. Vis. Image Underst. Vol. 73, pp. 428–440, 1999.
Gavrila, D.M., The visual analysis of human movement: A survey. Comput. Vis. Image Underst. Vol. 7(3), pp. 82–98, 1999.
Poppe, R., Vision-based human motion analysis: An overview. Comput. Vis. Image Underst. Vol. 1(8), pp. 4–18, 2007.
Ji, X., Liu, H., Advances in view-invariant human motion analysis: A review. IEEE Trans. Syst. Man Cybern, Vol. 40, pp. 13–24, 2000.
Moeslund, T.B., Hilton, A., Krüger, L., Visual Analysis of Humans. Springer, Vol. 3(2), pp. 1-10, 2011.
Hu, W., Tan, T., Wang, L., Maybank, S., A survey on visual surveillance of object motion and behaviors. IEEE Trans. Syst. Man Cybern. Vol. 3(4), pp. 334–352, 2004.
Wang, P., Li, W., Ogunbona, P., Wan, J., Escalera, S., Rgb-d-based human motion recognition with deep learning: A survey. Comput. Vis. Image Underst. Vol. 1(7), pp. 118–139, 2018.
Sminchisescu, C., 2008. 3d human motion analysis in monocular video: techniques and challenges. In: Human Motion. Springer, pp. 185–211.
Sarafianos, N., Boteanu, B., Ionescu, B., Kakadiaris, I.A., 3d human pose estimation: A review of the literature and analysis of covariates. Comput. Vis. Image Underst. Vol. 1(5), pp. 1–20, 2016.
Mehta, D., Sotnychenko, O., Mueller, F., Xu, W., Sridhar, S., Pons-Moll, G., Theobalt, C., Single-shot multi-person 3d body pose estimation from monocular rgb input. In: International Conference on 3D Vision, pp. 120-130, 2018.
Holte, M.B., Tran, C., Trivedi, M.M., Moeslund, T.B., 2012. Human pose estimation and activity recognition from multi-view videos: Comparative explorations of recent developments. IEEE J. Sel. Top. Signal Process. 6, 538–552.
Eichner, M., Ferrari, V., 2012b. Human pose co-estimation and applications. IEEE Trans. Pattern Anal. Mach. Intell. 34, 2282–2288.
Ren, S., He, K., Girshick, R., Sun, J., Faster r-cnn: Towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems. pp. 91–99, 2015.
Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B., 2d human pose estimation: New benchmark and state of the art analysis. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 3686–3693, 2014.
Jain, A., Tompson, J., Andriluka, M., Taylor, G.W., Bregler, C., Learning human pose estimation features with convolutional networks. arXiv preprint arXiv:1312.7302, 2013.
Cao, Z., Simon, T., Wei, S.E., Sheikh, Y., 2017. Realtime multi-person 2d pose estimation using part affinity fields. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 7291-7299.
T. L. Munea, Y. Z. Jembre, H. T. Weldegebriel, L. Chen, C. Huang, and C. Yang, “The progress of human pose estimation: A survey and taxonomy of models applied in 2d human pose estimation,” IEEE Access, 2020.
Chen, C.H., Ramanan, D., 2017. 3d human pose estimation 2d pose estimation matching. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 7035–7043.
Mehta, D., Sotnychenko, O., Mueller, F., Xu, W., Elgharib, M., Fua, P., Seidel, H.P., Rhodin, H., Pons-Moll, G., Theobalt, C., 2019. Xnect: Real-time multi-person 3d human pose estimation with a single rgb camer. arXiv:1907.00837.
Bogo, F., Kanazawa, A., Lassner, C., Gehler, P., Romero, J., Black, M.J., 2016. Keep it smpl: Automatic estimation of 3d human pose and shape from a single image. In: Proc. European Conference on Computer Vision. Springer, pp. 561–578.
Sarafianos, N., Boteanu, B., Ionescu, B., Kakadiaris, I.A., 2016. 3d human pose estimation: A review of the literature and analysis of covariates. Comput. Vis. Image Underst. 152, 1–20.
Rogez, G., Weinzaepfel, P., Schmid, C., LCR-net: Localization-classification regression for human pose. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 3433–3441, 2017.
Zanfir, A., Marinoiu, E., Sminchisescu, C., 2018. Monocular 3d pose and shape estimation of multiple people in natural scenes-the importance of multiple scene constraints. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 2148–2157.
Popa, A.I., Zanfir, M., Sminchisescu, C., 2017. Deep multitask architecture for integrated 2d and 3d human sensing. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 4714–4723.
F. Angelini, Z. Fu, Y. Long, L. Shao, and S. M. Naqvi, “Actionxpose: A novel 2d multi-view pose-based algorithm for real-time human action recognition,” arXiv preprint arXiv:1810.12126, 2018.
S. Das, S. Sharma, R. Dai, F. Br´emond, and M. Thonnat, “VPN: Learning video-pose embedding for activities of daily living,” in ECCV, 2020.
J.Wang, K. Qiu, H. Peng, J. Fu, and J. Zhu, “Ai coach: Deep human pose estimation and analysis for personalized athletic training assistance,” in ACM MM, 2019.
C. Weng, B. Curless, and I. Kemelmacher-Shlizerman, “Photo wake-up: 3d character animation from a single photo,” in CVPR, 2019.
H. Zhang, C. Sciutto, M. Agrawala, and K. Fatahalian, “Vid2player: Controllable video sprites that behave and appear like professional tennis players,” arXiv preprint arXiv:2008.04524, 2020.
Downloads
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.