A Study on the Classification of Hand Gesture for Mobile Virtual Reality with MediaPipe


  • Beom Jun Jo, Seong Ki Kim


Augmented Reality, Barracuda, MediaPipe, Virtual Reality


Virtual devices’ mobile application processors continue to evolve, and technologies are emerging accordingly. Although the virtual reality with hand tracking enables to manipulate the contents without a controller, the devices support only the limited actions, and the hand tracking can be used only for the simple games. However, hand tracking can provide users with an intuitive, comfortable feeling of operation and preventing accidents by using their hands right away. To track hands, there is a MediaPipe from Google that enables hand tracking with a normal webcam. This paper describes how to use a new motion for the virtual reality contents using MediaPipe for utilizing the advantages of hand tracking as a user interface. The implementation is a game using a hand gesture only. Also, this paper compares three implementations with different implementation methods of MediaPipe: ported from C++ to C#, using tflite, using Barracuda. Comparisons were made on both PC and mobile. On PC, Barracuda was the fastest with a maximum of 208 frames per second, but on mobile, Barracuda was the slowest with a minimum 12 frames per second. For this reason, this may vary depending on the project, it seems that it is still difficult to apply Barracuda to mobile contents.


Download data is not yet available.


F. Zhang et al., “MediaPipe Hands: On-device Real-time Hand Tracking,” arXiv, Jun. 2020. doi: 10.48550/arXiv.2006.10214.

C. Lugaresi et al., “MediaPipe: A Framework for Building Perception Pipelines,” arXiv, Jun. 2019. doi: 10.48550/arXiv.1906.08172.

B. Duy Khuat, D. Thai Phung, H. Thi Thu Pham, A. Ngoc Bui, and S. Tung Ngo, “Vietnamese sign language detection using Mediapipe,” in Proc. of the 2021 10th International Conference on Software and Computer Applications (ICSCA ’21). New York, NY, USA: Association for Computing Machinery, Jul. 2021, pp. 162–165. doi: 10.1145/3457784.3457810.

A. S. B. Pauzi et al., “Movement Estimation Using Mediapipe BlazePose,” in Advances in Visual Informatics, 2021, pp. 562–571. doi: 10.1007/978-3-030-90235-3_49.

Unity Manual “Introduction to Barracuda.” Accessed: Apr. 11, 2024. [Online]. Available: https://docs.unity3d.com/Packages/com.unity.ba rracuda@3.0/manual/index.html

ONNX documentation, “Introduction to ONNX.” Accessed: Aug. 28, 2023, [Online]. Available: https://onnx.ai/onnx/intro/

Unity Japan, “Multi-platform operation of ONNX neural network model using Unity Barracuda - CEDEC2021”, Aug. 2021. Accessed: Sep. 04, 2023, [Online]. Available: https://www.youtube.com/watc h?v=dMgm4ZYfaUI

M. Mazzamuto, F. Ragusa, A. Resta, G. Farinella, and A. Furnari, “A Wearable Device Application for Human-Object Interactions Detection,” in Proc. of the 18th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, Lisbon, Portugal, 2023, pp. 664–671. doi: 10.5220/0011725800003417.

R. C. Castanyer, S. Martínez-Fernández, and X. Franch, “Integration of Convolutional Neural Networks in Mobile Applications,” arXiv, Mar. 2021. doi: 10.48550/arXiv.2103.07286.

J. Kim, J. Lee, M. Kim, and D. Kim, “Design and development of traditional lion mask avatar mapping and animation system based on the user motion recognition using deep learning technology,” in Proc. HCI Korea 2023, 2023, pp. 5–9.

G. Garg and S. Shivani, “Controller free hand interaction in Virtual Reality,” in 2022 OITS International Conference on Information Technology (OCIT), Feb. 2022, pp. 553–557. doi: 10.1109/OCIT56763.2022.00108.




How to Cite

Beom Jun Jo. (2024). A Study on the Classification of Hand Gesture for Mobile Virtual Reality with MediaPipe. International Journal of Intelligent Systems and Applications in Engineering, 12(21s), 3555 –. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/6076



Research Article