VRSS: A Touch-to-Vision-Text-Audio Artificial Multi-Modal Sensory System to Demonstrate Neural Network Processes

Authors

  • Vijay Harkare Student, Department of Computer Engineering, Dwarkadas J. Sanghvi College of Engineering, Mumbai – 400 056, INDIA
  • Riddhi Sanghani Student, Department of Computer Engineering, Dwarkadas J. Sanghvi College of Engineering, Mumbai – 400 056, INDIA
  • Shruti Prasad Student, Department of Computer Engineering, Dwarkadas J. Sanghvi College of Engineering, Mumbai – 400 056, INDIA
  • Sanika Ardekar Student, Department of Computer Engineering, Dwarkadas J. Sanghvi College of Engineering, Mumbai – 400 056, INDIA
  • Aruna Gawade Assistant Professor, Department of Computer Engineering, Dwarkadas J. Sanghvi College of Engineering, Mumbai – 400 056, INDIA
  • Ramchandra Mangrulkar Associate Professor, Department of Computer Engineering, Dwarkadas J. Sanghvi College of Engineering, Mumbai – 400 056, INDIA

Keywords:

Classification, Convolutional Neural Network (CNN), DIGIT sensor, multi-modal, touch, vision

Abstract

The human brain is the most complicated human organ, and simulating its functionality is an exceedingly challenging task, particularly the multi-modal sensory functionalities of the brain. Results from biological experiments show that it is possible to identify instances of objects using tactile signals. This research uses similar concepts for modelling a multi-modal sensory input processing system for tactile inputs. VRSS is a novel touch-to-vision-to-text-to-audio system which simulates the multi-modal sensory behavior of the brain by converting tactile inputs to visual images, which are further converted to audio and text. The main aim of this research is to classify object instances based on tactile signals. Tactile inputs are captured and implicitly converted to visual inputs using the DIGIT sensor simulated in the TACTO simulator, and using them, the object is classified using Convolutional Neural Networks. The classification output is further converted into audio, thus successfully simulating three modalities - touch, vision, and sound. For construction of VRSS, multiple pretrained CNNs with different configurations of hyperparameters were tested, and the pretrained ConvNeXtTiny model had the best accuracy of them all - 91%. It was further modified, and the accuracy of the resulting custom VRSS CNN Model was found to be 95.83%. Following these results, this research will help in expanding the applicability of different CNNs. Along with this, it will also facilitate in-depth understanding of the human multi-modal sensory system, and also has wide scope in the fields of artificial intelligence and robotics, particularly in the navigation of uncharted territories.

Downloads

Download data is not yet available.

References

S. Luo, W. Yuan, E. Adelson, A. G. Cohn, and R. Fuentes, “ViTac: Feature Sharing between Vision and Tactile Sensing for Cloth Texture Recognition,” in Proceedings - IEEE International Conference on Robotics and Automation, 2018. doi: 10.1109/ICRA.2018.8460494.

S. Luo, J. Bimbo, R. Dahiya, and H. Liu, “Robotic tactile perception of object properties: A review,” Mechatronics, vol. 48. 2017. doi: 10.1016/j.mechatronics.2017.11.002.

B. Wang, Y. Yang, X. Xu, A. Hanjalic, and H. T. Shen, “Adversarial cross-modal retrieval,” in MM 2017 - Proceedings of the 2017 ACM Multimedia Conference, 2017. doi: 10.1145/3123266.3123326.

[4] F. R. Hogan, M. Jenkin, S. Rezaei-Shoshtari, Y. Girdhar, D. Meger, and G. Dudek, “Seeing through your Skin: Recognizing objects with a novel visuotactile sensor,” in Proceedings - 2021 IEEE Winter Conference on Applications of Computer Vision, WACV 2021, 2021. doi: 10.1109/WACV48630.2021.00126.

S. Sundaram, P. Kellnhofer, Y. Li, J. Y. Zhu, A. Torralba, and W. Matusik, “Learning the signatures of the human grasp using a scalable tactile glove,” Nature, vol. 569, no. 7758, 2019, doi: 10.1038/s41586-019-1234-z.

O. Ozioko and R. Dahiya, “Smart Tactile Gloves for Haptic Interaction, Communication, and Rehabilitation,” Advanced Intelligent Systems, vol. 4, no. 2, 2022, doi: 10.1002/aisy.202100091.

M. Altamirano Cabrera, J. Heredia, and D. Tsetserukou, “Tactile perception of objects by the user’s palm for the development of multi-contact wearable tactile displays,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2020. doi: 10.1007/978-3-030-58147-3_6.

P. K. Murali, C. Wang, D. Lee, R. Dahiya, and M. Kaboli, “Deep Active Cross-Modal Visuo-Tactile Transfer Learning for Robotic Object Recognition,” IEEE Robot Autom Lett, vol. 7, no. 4, 2022, doi: 10.1109/LRA.2022.3191408.

F. Ito and K. Takemura, “A model for estimating tactile sensation by machine learning based on vibration information obtained while touching an object,” Sensors, vol. 21, no. 23, 2021, doi: 10.3390/s21237772.

X. Zhang et al., “Target classification method of tactile perception data with deep learning,” Entropy, vol. 23, no. 11, 2021, doi: 10.3390/e23111537.

S. Cai, K. Zhu, Y. Ban, and T. Narumi, “Visual-Tactile Cross-Modal Data Generation Using Residue-Fusion GAN with Feature-Matching and Perceptual Losses,” IEEE Robot Autom Lett, vol. 6, no. 4, 2021, doi: 10.1109/LRA.2021.3095925.

G. Rouhafzay, A. M. Cretu, and P. Payeur, “Transfer of learning from vision to touch: A hybrid deep convolutional neural network for visuo-tactile 3d object recognition,” Sensors (Switzerland), vol. 21, no. 1, 2021, doi: 10.3390/s21010113.

J. T. Lee, D. Bollegala, and S. Luo, “‘Touching to see’ and ‘seeing to feel’: Robotic cross-modal sensory data generation for visual-tactile perception,” in Proceedings - IEEE International Conference on Robotics and Automation, 2019. doi: 10.1109/ICRA.2019.8793763.

J. Lin, R. Calandra, and S. Levine, “Learning to identify object instances by touch: Tactile recognition via multimodal matching,” in Proceedings - IEEE International Conference on Robotics and Automation, 2019. doi: 10.1109/ICRA.2019.8793885.

X. Li, H. Liu, J. Zhou, and F. Sun, “Learning cross-modal visual-tactile representation using ensembled generative adversarial networks,” Cognitive Computation and Systems, vol. 1, no. 2, 2019, doi: 10.1049/ccs.2018.0014.

Y. Li, J. Y. Zhu, R. Tedrake, and A. Torralba, “Connecting touch and vision via cross-modal prediction,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2019. doi: 10.1109/CVPR.2019.01086.

S. Pohtongkam and J. Srinonchat, “Tactile object recognition for humanoid robots using new designed piezoresistive tactile sensor and dcnn,” Sensors, vol. 21, no. 18, 2021, doi: 10.3390/s21186024.

P. Falco, S. Lu, A. Cirillo, C. Natale, S. Pirozzi, and D. Lee, “Cross-modal visuo-tactile object recognition using robotic active exploration,” in Proceedings - IEEE International Conference on Robotics and Automation, 2017. doi: 10.1109/ICRA.2017.7989619.

G. Izatt, G. Mirano, E. Adelson, and R. Tedrake, “Tracking objects with point clouds from vision and touch,” in Proceedings - IEEE International Conference on Robotics and Automation, 2017. doi: 10.1109/ICRA.2017.7989460.

M. Lambeta et al., “DIGIT: A Novel Design for a Low-Cost Compact High-Resolution Tactile Sensor with Application to In-Hand Manipulation,” IEEE Robot Autom Lett, vol. 5, no. 3, 2020, doi: 10.1109/LRA.2020.2977257.

S. Wang, M. Lambeta, P. W. Chou, and R. Calandra, “TACTO: A Fast, Flexible, and Open-Source Simulator for High-Resolution Vision-Based Tactile Sensors,” IEEE Robot Autom Lett, vol. 7, no. 2, 2022, doi: 10.1109/LRA.2022.3146945.

F. Rajeena P. P., A. S. U., M. A. Moustafa, and M. A. S. Ali, “Detecting Plant Disease in Corn Leaf Using EfficientNet Architecture—An Analytical Approach,” Electronics (Basel), vol. 12, no. 8, p. 1938, Apr. 2023, doi: 10.3390/electronics12081938.

P. K. Allen and K. S. Roberts, “Haptic object recognition using a multi-fingered dextrous hand,” in Proceedings - IEEE International Conference on Robotics and Automation, 1989. doi: 10.1109/robot.1989.100011.

S. Caselli, C. Magnanini, and F. Zanichelli, “On the robustness of haptic object recognition based on polyhedral shape representations,” in IEEE International Conference on Intelligent Robots and Systems, 1995. doi: 10.1109/iros.1995.526160.

Z. Pezzementi, C. Reyda, and G. D. Hager, “Object mapping, recognition, and localization from tactile geometry,” in Proceedings - IEEE International Conference on Robotics and Automation, 2011. doi: 10.1109/ICRA.2011.5980363.

M. Meier, M. Schöpfer, R. Haschke, and H. Ritter, “A probabilistic approach to tactile shape reconstruction,” IEEE Transactions on Robotics, vol. 27, no. 3, 2011, doi: 10.1109/TRO.2011.2120830.

A. Aggarwal, P. Kampmann, J. Lemburg, and F. Kirchner, “Haptic object recognition in underwater and deep-sea environments,” J Field Robot, vol. 32, no. 1, 2015, doi: 10.1002/rob.21538.

V. K. Nanayakkara, G. Cotugno, N. Vitzilaios, D. Venetsanos, T. Nanayakkara, and M. N. Sahinkaya, “The Role of Morphology of the Thumb in Anthropomorphic Grasping: A Review,” Frontiers in Mechanical Engineering, vol. 3. 2017. doi: 10.3389/fmech.2017.00005.

J. Bimbo, S. Luo, K. Althoefer, and H. Liu, “In-Hand Object Pose Estimation Using Covariance-Based Tactile To Geometry Matching,” IEEE Robot Autom Lett, vol. 1, no. 1, 2016, doi: 10.1109/LRA.2016.2517244.

J. M. Gandarias, A. J. Garcia-Cerezo, and J. M. Gomez-De-Gabriel, “CNN-Based Methods for Object Recognition with High-Resolution Tactile Sensors,” IEEE Sens J, vol. 19, no. 16, 2019, doi: 10.1109/JSEN.2019.2912968.

S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Transactions on Knowledge and Data Engineering, vol. 22, no. 10. 2010. doi: 10.1109/TKDE.2009.191.

Y. Ganin et al., “Domain-adversarial training of neural networks,” Journal of Machine Learning Research, vol. 17, 2016.

G. Csurka, “A comprehensive survey on domain adaptation for visual applications,” in Advances in Computer Vision and Pattern Recognition, 2017. doi: 10.1007/978-3-319-58347-1_1.

P. Isola, J. Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” in Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, 2017. doi: 10.1109/CVPR.2017.632.

K. Saenko, B. Kulis, M. Fritz, and T. Darrell, “Adapting visual category models to new domains,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2010. doi: 10.1007/978-3-642-15561-1_16.

Pekka Koskinen, Pieter van der Meer, Michael Steiner, Thomas Keller, Marco Bianchi. Automated Feedback Systems for Programming Assignments using Machine Learning. Kuwait Journal of Machine Learning, 2(2). Retrieved from http://kuwaitjournals.com/index.php/kjml/article/view/190

Juhani Nieminen , Johan Bakker, Martin Mayer, Patrick Schmid, Andrea Ricci. Exploring Explainable AI in Educational Machine Learning Models. Kuwait Journal of Machine Learning, 2(2). Retrieved from http://kuwaitjournals.com/index.php/kjml/article/view/191

Vadivu, N. S., Gupta, G., Naveed, Q. N., Rasheed, T., Singh, S. K., & Dhabliya, D. (2022). Correlation-based mutual information model for analysis of lung cancer CT image. BioMed Research International, 2022, 6451770. doi:10.1155/2022/6451770

Downloads

Published

16.07.2023

How to Cite

Harkare, V. ., Sanghani, R. ., Prasad, S. ., Ardekar, S. ., Gawade, A. ., & Mangrulkar, R. . (2023). VRSS: A Touch-to-Vision-Text-Audio Artificial Multi-Modal Sensory System to Demonstrate Neural Network Processes. International Journal of Intelligent Systems and Applications in Engineering, 11(3), 691–703. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/3275

Issue

Section

Research Article