User Login Behaviour Analysis in HPC clusters using Data Analysis and Probabilistic Technique

Authors

  • Atharv Nagarikar Senior Project Engineer, HPC-Technologies, C-DAC, Pune
  • Abhishek Patel Project Engineer, HPC-Technologies, C-DAC, Pune
  • Krishan Gopal Gupta Joint DirectorC-Technologies, C-DAC, Pune
  • Shashank Sharma Prinicpal Technical Officer, HPC-Technologies, C-DAC, Pune
  • Mohammed Afzal Module Leader, HPC-Technologies, C-DAC, Pune
  • Sanjay Wandhekarl Senior Director HOD, HPC-Technologies, C-DAC, Pune

Keywords:

Data Analysis, Feature Engineering, HPC clusters, HPC security, Machine Learning, Probabilistic Modeling

Abstract

The login behaviour of users in HPC clusters will be examined in this research paper with the use of machine learning, data analysis and probabilistic techniques to identify patterns that can be used to identify anomalous login behaviour based on IP and timings. The customized probabilistic model will monitor each user's login behaviour and stop any unauthorized activity in HPC clusters.This paper also discusses the data preprocessing and feature engineering techniques to extract information from sshd logs. Despite several attempts to utilize machine learning models to recognize and save user patterns from sshd logs, the majority of the models proved unsuccessful. Since the sshd logs do not exhibit any distinct login behaviour or pattern of users, machine learning models trained on specific behaviour or patterns from datasets struggle to detect even slight changes in user login behaviour, resulting in a high rate of failure for such models. One major factor contributing to the failure of machine learning models is the highly imbalanced dataset consisting solely of true values, which makes it challenging for the models to identify outliers or patterns. Additionally, since the users do not follow any specific pattern, it may be necessary to establish a single pattern and categorize the login behaviour of all users accordingly to achieve successful results. To overcome the challenges of identifying user login patterns and detecting anomalies, an automated approach is necessary. This approach should involve the development of a system that can detect and store each user's login patterns as probability metrics along with IP addresses. These metrics can then be utilized to predict any anomalous activity based on the identified patterns

Downloads

Download data is not yet available.

References

L. Alvarez, E. Ayguade and F. Mantovani, "Teaching HPC Systems and Parallel Programming with Small-Scale Clusters," 2018 IEEE/ACM Workshop on Education for High-Performance Computing (EduHPC), Dallas, TX, USA, 2018, pp. 1-10, doi: 10.1109/EduHPC.2018.00004.

A. Ramirez, "Supercomputing: Past, present, and a possible future," 2011 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation, Samos, Greece, 2011, pp. ii-ii, doi: 10.1109/SAMOS.2011.6045437.

R. Bulusu, P. Jain, P. Pawar, M. Afzal and S. Wandhekar, "Addressing security aspects for HPC infrastructure," 2018 International Conference on Information and Computer Technologies (ICICT), DeKalb, IL, USA, 2018, pp. 27-30, doi: 10.1109/INFOCT.2018.8356835.

S. Singh, K. R. Ramkumar and A. Kukkar, "Machine Learning Techniques and Implementation of Different ML Algorithms," 2021 2nd Global Conference for Advancement in Technology (GCAT), Bangalore, India, 2021, pp. 1-6, doi: 10.1109/GCAT52182.2021.9586806.

R. Li, Q. Tao, Y. Luo and L. Yan, "Research on Practical Application of Data Analysis and Visualization," 2020 International Conference on Virtual Reality and Intelligent Systems (ICVRIS), Zhangjiajie, China, 2020, pp. 78-81, doi: 10.1109/ICVRIS51417.2020.00026.

I. Gupta, R. Gupta, A. K. Singh and R. Buyya, "MLPAM: A Machine Learning and Probabilistic Analysis Based Model for Preserving Security and Privacy in Cloud Environment," in IEEE Systems Journal, vol. 15, no. 3, pp. 4248-4259, Sept. 2021, doi: 10.1109/JSYST.2020.3035666.

Luo, Zhengping & Qu, Zhe & Nguyen, Tung & Zeng, Hui & Lu, Zhuo. (2019). Security of HPC Systems: From a Log-analyzing Perspective. ICST Transactions on Security and Safety. 6. 163134. 10.4108/eai.19-8-2019.163134.

D. Liu, Y. Zhang, H. Zhang and F. Lou, "User behaviour control method for HPC system," 2021 China Automation Congress (CAC), Beijing, China, 2021, pp. 2918-2922, doi: 10.1109/CAC53003.2021.9727368.

A. Prout et al., "Enhancing HPC security with a user-based firewall," 2016 IEEE High Performance Extreme Computing Conference (HPEC), Waltham, MA, USA, 2016, pp. 1-4, doi: 10.1109/HPEC.2016.7761641.

Hou, Tao & Wang, Tao & Shen, Dakun & Lu, Zhuo & Liu, Yao. (2020). Autonomous Security Mechanisms for High-Performance Computing Systems: Review and Analysis. 10.1007/978-3-030-33432-1_6.

Peisert, Sean. (2017). Security in high-performance computing environments. Communications of the ACM. 60. 72-80. 10.1145/3096742.

https://nsmindia.in/

H. He and E. A. Garcia, "Learning from Imbalanced Data," in IEEE Transactions on Knowledge and Data Engineering, vol. 21, no. 9, pp. 1263-1284, Sept. 2009, doi: 10.1109/TKDE.2008.239.

A. Farid, N. Ali and M. u. Haq, "Reliability in Systems Based on Some Probability Models," 2018 International Conference on Applied and Engineering Mathematics (ICAEM), Taxila, Pakistan, 2018, pp. 120-125, doi: 10.1109/ICAEM.2018.8536309.

M. Imran, A. M. Mahmood and A. Abdul Moiz Qyser, "An empirical experimental evaluation on imbalanced data sets with varied imbalance ratio," International Conference on Computing and Communication Technologies, Hyderabad, India, 2014, pp. 1-7, doi: 10.1109/ICCCT2.2014.7066742.

S. J. Basha, S. R. Madala, K. Vivek, E. S. Kumar and T. Ammannamma, "A Review on Imbalanced Data Classification Techniques," 2022 International Conference on Advanced Computing Technologies and Applications (ICACTA), Coimbatore, India, 2022, pp. 1-6, doi: 10.1109/ICACTA54488.2022.9753392.

Jang Bahadur Saini, D. . (2022). Pre-Processing Based Wavelets Neural Network for Removing Artifacts in EEG Data. Research Journal of Computer Systems and Engineering, 3(1), 43–47. Retrieved from https://technicaljournals.org/RJCSE/index.php/journal/article/view/40

Mehta, E. S. ., & Padhi, S. . (2023). Quality and Defect Prediction in Plastic Injection Molding using Machine Learning Algorithms based Gating Systems and Its Mathematical Models. International Journal on Recent and Innovation Trends in Computing and Communication, 11(3s), 216–230. https://doi.org/10.17762/ijritcc.v11i3s.6183

Downloads

Published

04.11.2023

How to Cite

Nagarikar, A. ., Patel, A. ., Gupta, K. G. ., Sharma, S. ., Afzal, M. ., & Wandhekarl, S. . (2023). User Login Behaviour Analysis in HPC clusters using Data Analysis and Probabilistic Technique. International Journal of Intelligent Systems and Applications in Engineering, 12(3s), 250–259. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/3703

Issue

Section

Research Article