Comprehensive Survey on Agent Based Deep Learning Techniques for Space Landing Missions


  • Utkarsh R. Moholkar Department of Computer Engineering, Smt. Kashibai Navale College of Engineering, Savitribai Phule Pune University, Pune, India
  • Dipti D. Patil Department of Information Technology MKSSS’s Cummins College of Engineering for Women, Savitribai Phule Pune University Pune, India


Reinforcement Learning (RL), Digital Terrain Model (DTM), Lunar Reconnaissance Orbiter (LROC), Deep Learning (DL), Deep Reinforcement Learning (DRL), Deep Q-Network (DQN), Proximal Policy Optimization (PPO), Advantage Actor-Critic (A2C), Deep Deterministic Policy Gradient (DDPG), Trust Region Policy Optimization (TRPO)


Spacecraft landing is a complex and challenging task that requires precise control and decision making. In recent years, reinforcement learning (RL) has emerged as a promising approach for spacecraft landing, enabling autonomous and adaptive control strategies. This literature survey paper presents an overview of the existing research on spacecraft landing using RL. We examine various RL algorithms, simulation environments, and evaluation metrics employed in this domain. Furthermore, we discuss the challenges, limitations, and future directions for applying RL to spacecraft landing. This survey aims to provide researchers and practitioners with a comprehensive understanding of the current state-of-the-art in this field and inspire further advancements in spacecraft landing using RL.


Download data is not yet available.


G. Ciabatti, S. Daftry and R. Capobianco, "Autonomous Planetary Landing via Deep Reinforcement Learning and Transfer Learning," 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Nashville, TN, USA, 2021, pp. 2031- 2038, doi: 10.1109/CVPRW53098.2021.00231.

Scorsoglio, Andrea & Furfaro, Roberto & Linares, Richard & Gaudet, Brian. (2020). Image based Deep Reinforcement Learning for Autonomous Lunar Landing. 10.2514/6.2020-1910.

Gaudet, Brian & Linares, Richard & Furfaro, Roberto. (2020). Deep Reinforcement Learning for Six Degree-of-Freedom Planetary Landing. Advances in Space Research. 65. 10.1016/j.asr.2019.12.030.

Sullivan, Christopher & Bosanac, Natasha. (2020). Using Reinforcement Learning to Design a Low-Thrust Approach into a Periodic Orbit in a Multi-Body System. 10.2514/6.2020-1914.

Gaudet, Brian & Furfaro, Roberto. (2012). Robust Spacecraft Hovering Near Small Bodies in Environments with Unknown Dynamics Using Reinforcement Learning. 10.2514/6.2012- 5072.

Russo, Antonia, and Gianluca Lax. 2022. "Using Artificial Intelligence for Space Challenges: A Survey" Applied Sciences 12, no. 10: 5106.

Wilson, C., Riccardi, A. Improving the efficiency of reinforcement learning for a spacecraft powered descent with Q-learning. Optim Eng 24, 223–255 (2023).

Yen, Gary & Hickey, Travis. (2004). Reinforcement learning algorithms for robotic navigation in dynamic environments. ISA transactions. 43. 217-30. 10.1016/S0019- 0578(07)60032-9.

Tipaldi, Massimo & Iervolino, Raffaele & Massenio, Paolo. (2022). Reinforcement learning in spacecraft control applications: Advances, prospects, and challenges. Annual Reviews in Control. 10.1016/j.arcontrol.2022.07.004.

Silvestrini, Stefano, and Michèle Lavagna. 2022. "Deep Learning and Artificial Neural Networks for Spacecraft Dynamics, Navigation and Control" Drones 6, no. 10: 270.

Trisolini, Mirko & Colombo, Camilla & Lewis, Hugh. (2017). Multi-objective optimization for spacecraft design for demise and survivability.

Chen, X. (2020). Data-Efficient Reinforcement and Transfer Learning in Robotics (PhD dissertation, KTH Royal Institute of Technology). Retrieved from

Sutton, Richard S. and Barto, Andrew G.. Reinforcement Learning: An Introduction. Second: The MIT Press, 2018.

Habib, Nazia. (2019) 2019. Hands-On Q-Learning with Python. 1st ed. Packt Publishing.

Winder, Phil. Reinforcement Learning. O'Reilly Media, Inc., 2020.

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, "Proximal Policy Optimization Algorithms," arXiv:1707.06347 [cs.LG], Jul. 2017.

Haarnoja, T., Zhou, A., Abbeel, P. & Levine, S.. (2018). Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:1861-1870Availablefrom

T. Tiong, I. Saad, K. T. K. Teo and H. b. Lago, "Deep Reinforcement Learning with Robust Deep Deterministic Policy Gradient," 2020 2nd International Conference on Electrical, Control and Instrumentation Engineering (ICECIE), Kuala Lumpur, Malaysia, 2020, pp. 1-5, doi: 10.1109/ICECIE50279.2020.9309539.

W. Meng, Q. Zheng, Y. Shi and G. Pan, "An Off-Policy Trust Region Policy Optimization Method With Monotonic Improvement Guarantee for Deep Reinforcement Learning," in IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 5, pp. 2223-2235, May 2022, doi: 10.1109/TNNLS.2020.3044196.

Chadi, M. A., & Mousannif, H. (2023). Understanding Reinforcement Learning Algorithms: The Progress from Basic Q-learning to Proximal Policy Optimization. arXiv preprint arXiv:2304.00026.

Powell, W. B., & Frazier, P. (2008). Optimal learning. In State-of-the-Art Decision-Making Tools in the Information-Intensive Age (pp. 213-246). Informs.

Cruz, F., Dazeley, R., Vamplew, P., & Moreira, I. (2021). Explainable robotic systems: Understanding goal-driven actions in a reinforcement learning scenario. Neural Computing and Applications, 1-18

Yang, J., Hou, X., Hu, Y. H., Liu, Y., & Pan, Q. (2020). A reinforcement learning scheme for active multi-debris removal mission planning with modified upper confidence bound tree search. IEEE Access, 8, 108461-108473.

Russo, D. J., Van Roy, B., Kazerouni, A., Osband, I., & Wen, Z. (2018). A tutorial on thompson sampling. Foundations and Trends® in Machine Learning, 11(1), 1-96. Braun, D., Marb, M. M., Angelov, J., Wechner, M., & Holzapfel, F. (2023). Worst-Case Analysis of Complex Nonlinear Flight Control Designs Using Deep Q-Learning. Journal of Guidance, Control, and Dynamics, 1-13.

Chen, Y., & Kulla, E. (2019). A Deep Q-Network with Experience Optimization (DQN-EO) for Atari’s Space Invaders. In Web, Artificial Intelligence and Network Applications: Proceedings of the Workshops of the 33rd International Conference on Advanced Information Networking and Applications (WAINA-2019) 33 (pp. 351-361). Springer International Publishing.

Shi, X. N., Zhou, D., Chen, X., & Zhou, Z. G. (2021). Actor-critic-based predefined-time control for spacecraft attitude formation system with guaranteeing prescribed performance on SO (3). Aerospace Science and Technology, 117, 106898.

Miller, D., Englander, J. A., & Linares, R. (2019, August). Interplanetary low-thrust design using proximal policy optimization. In 2019 AAS/AIAA Astrodynamics Specialist Conference (No. GSFC-E-DAA-TN71225)

Zhang, J., Wu, F., Zhao, J., & Xu, F. (2019). A method of attitude control based on deep deterministic policy gradient. In Cognitive Systems and Signal Processing: 4th International Conference, ICCSIP 2018, Beijing, China, November 29-December 1, 2018, Revised Selected Papers, Part II 4 (pp. 197-207). Springer Singapore.

Miller, D., & Linares, R. (2019, February). Low-thrust optimal control via reinforcement learning. In 29th AAS/AIAA Space Flight Mechanics Meeting (Vol. 168, pp. 1817-1834). American Astronautical Society Ka’anapali, Hawaii.

Chen, S. Y. (2011). Kalman filter for robot vision: a survey. IEEE Transactions on industrial electronics, 59(11), 4409-4420.

Rohilla, R., Sikri, V., & Kapoor, R. (2017). Spider monkey optimisation assisted particle filter for robust object tracking. IET Computer Vision, 11(3), 207-219.

Fujii, K. (2013). Extended kalman filter. Refernce Manual, 14, 41.

Dong, J., Zhuang, D., Huang, Y., & Fu, J. (2009). Advances in multi-sensor data fusion: Algorithms and applications. Sensors, 9(10), 7771-7784.

Capra, L., Brandonisio, A., & Lavagna, M. (2023). Network architecture and action space analysis for deep reinforcement learning toward spacecraft autonomous guidance. Advances in Space Research, 71(9), 3787-3802

Peng, X., & Gong, Y. (2022). Globally stable proportional‐integral‐derivative control for spacecraft pose tracking via dual quaternions. IET Control Theory & Applications, 16(18), 1847-1859.

Weiss, A., Baldwin, M., Erwin, R. S., & Kolmanovsky, I. (2015). Model predictive control for spacecraft rendezvous and docking: Strategies for handling constraints and case studies. IEEE Transactions on Control Systems Technology, 23(4), 1638-1647.

Hu, W., Li, Z., Dai, M. Z., & Huang, T. (2023). Robust adaptive control for spacecraft attitude synchronization subject to external disturbances: A performance adjustable event‐ triggered mechanism. International Journal of Robust and Nonlinear Control, 33(3), 2392- 2408.

Chai, R., Tsourdos, A., Savvaris, A., Chai, S., Xia, Y., & Chen, C. P. (2019). Six-DOF spacecraft optimal trajectory planning and real-time attitude control: a deep neural network-based approach. IEEE transactions on neural networks and learning systems, 31(11), 5005- 5013.2408.

Dewey, D. (2014, March). Reinforcement learning and the reward engineering principle. In 2014 AAAI Spring Symposium Series.

Grzes, M. (2017). Reward shaping in episodic reinforcement learning.

Narvekar, S., Peng, B., Leonetti, M., Sinapov, J., Taylor, M. E., & Stone, P. (2020). Curriculum learning for reinforcement learning domains: A framework and survey. The Journal of Machine Learning Research, 21(1), 7382-7431.

Zhu, Z., Lin, K., Jain, A. K., & Zhou, J. (2023). Transfer learning in deep reinforcement learning: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence.

Janhavi H. Borse, Dipti D. Patil, Vinod Kumar, Sudhir Kumar, "Soft Landing Parameter Measurements for Candidate Navigation Trajectories Using Deep Learning and AI-Enabled Planetary Descent", Mathematical Problems in Engineering, vol. 2022, Article ID 2886312, 14 pages, 2022. 2021-22 (6)

Utkarsh R. Moholkar, Dipti D. Patil, Vinod Kumar, Archana Patil, “Deep Learning Approach for Unmanned Aerial Vehicle Landing”, in International Journal of Innovative Technology and Exploring Engineering (IJITEE) ISSN: 2278-3075 (Online), Volume-11 Issue-10, September 2022 Doi: 10.35940/ijitee.J9263.09111022

Janhvi Borse, Dipti D. Patil, “Empirical Analysis of Feature Points Extraction Techniques for Space Applications”, in International Journal of Advanced Computer Science and Applications, ISSN: 2158-107X, Vol. 12, No. 9, 2021

J. Borse, D. Patil and V. Kumar, "Tracking Keypoints from Consecutive Video Frames Using CNN Features for Space Applications", Technical Journal, vol.15, no. 1, pp. 11-17, 2021. [Online].

Chindhe, B., Ramalingam, A., Chavan, S., Hardas, S., Patil, D. (2023). Advances in Vision-Based UAV Manoeuvring Techniques. In: Thampi, S.M., Mukhopadhyay, J., Paprzycki, M., Li, KC. (eds) International Symposium on Intelligent Informatics. ISI 2022. Smart Innovation, Systems and Technologies, vol 333. Springer, Singapore.




How to Cite

Moholkar, U. R. ., & Patil, D. D. . (2024). Comprehensive Survey on Agent Based Deep Learning Techniques for Space Landing Missions. International Journal of Intelligent Systems and Applications in Engineering, 12(16s), 188–200. Retrieved from



Research Article