A Random Forest Model for Prediction of Software Engineering Skill Set among Computer Science Students through Explainable AI


  • Jasmin Nizar, R. Sharmila, K. U. Jaseena


Skillset, Software Engineering, Explainable artificial intelligence, Principal Component Analysis, Random Forest, Machine Learning, SHAP.


Student skill evaluation is an essential part of education as it gives information on each student's unique talents, strengths, abilities, and areas for development. The purpose of skill-based education in software engineering is to close the knowledge gap between courses of study and industry demands, so graduates can make valuable contributions in a professional software development setting. This method emphasises the value of actual skills in addition to academic knowledge which is in line with the dynamic and quick evolving nature of the software business. A well-rounded set of Soft skills, Life skills, and Technical skills is frequently cited as the reason for the success of individuals working on software development projects. In today's educational landscape, predicting students' skill sets is imperative, encompassing a spectrum of capabilities ranging from Soft and Life skills to Technical expertise. Achieving equilibrium among these proficiencies is crucial for excelling in the ever-changing and cooperative milieu of software development endeavors. This research introduces a novel predictive framework leveraging Random Forest (RF) Algorithm, Principal Component Analysis (PCA) and Explainable Artificial Intelligence (XAI) for software engineering students skillset prediction. The purpose of Random Forest in skillset prediction is to enhance predictive accuracy and robustness by aggregating the outputs of multiple decision trees. To further optimize the efficiency of the proposed model, this study incorporates Principal Component Analysis that ensures the extraction of high-quality and relevant features. Additionally, the study employs Explainable AI techniques using SHAP to identify key features crucial for accurate predictions. The performance of the proposed classification model is evaluated using metrics like accuracy, precision, recall, F1 score, and the Area Under Curve (AUC) value. The simulation results indicate that the recommended PCA-enhanced Random Forest using the XAI model exhibits superior predictive accuracy compared to the baseline machine learning models. 


Download data is not yet available.


Garousi, V., Giray, G., Tuzun, E., Catal, C. and Felderer, M., 2019. Closing the gap between software engineering education and industrial needs. IEEE software, 37(2), pp.68-77.Belzer K 2001 Project management: Still more art than science, In PM Forum Featured Papers (pp. 1-6).

Garousi, V., Giray, G., Tüzün, E., Catal, C. and Felderer, M., 2019. Aligning software engineering education with industrial needs: A meta-analysis. Journal of Systems and Software, 156, pp.65-83.

Belzer K 2001 Project management: Still more art than science, In PM Forum Featured Papers (pp. 1-6).

Cihan P and Kalipsiz O 2014 Evaluation of students’ skills in software project. TEM Journal, 3(1), p.42.

Sunindijo R Y 2015 Project manager skills for improving project performance. International Journal of Business Performance Management, 16(1), pp.67-83.

Akdur, D., 2022. Analysis of software engineering skills gap in the industry. ACM Transactions on Computing Education, 23(1), pp.1-28.

Begel A and Simon B 2008 Struggles of new college graduates in their first software development job. In Proceedings of the 39th SIGCSE technical symposium on Computer science education (pp. 226-230).

Begel A and Simon B 2008 Novice software developers, all over again. In Proceedings of the fourth international workshop on computing education research (pp. 3-14).

Gnatz M, Kof L, Prilmeier F and Seifert T 2003 A practical approach of teaching software engineering. In Proceedings 16th Conference on Software Engineering Education and Training, (CSEE&T 2003). (pp. 120-128). IEEE.

Yeh R T 2002 Educating future software engineers. IEEE Transactions on education, 45(1), pp.2-3.

Garcia I, Pacheco C and Coronel N 2010 Learn from practice: defining an alternative model for software engineering education in Mexican universities for reducing the breach between industry and academia. In Proceedings of the International Conference on Applied Computer Science (pp. 120-124).

Mezhoudi, N., Alghamdi, R., Aljunaid, R., Krichna, G. and Düştegör, D., 2023. Employability prediction: a survey of current approaches, research challenges and applications. Journal of Ambient Intelligence and Humanized Computing, 14(3), pp.1489-1505.

Akdur, D., 2019, June. The design of a survey on bridging the gap between software industry expectations and academia. In 2019 8th Mediterranean Conference on Embedded Computing (MECO) (pp. 1-5). IEEE.

Fang, X., Lee, S. and Koh, S., 2005. Transition of knowledge/skills requirement for entry-level IS professionals: An exploratory study based on recruiters' perception. Journal of Computer Information Systems, 46(1), pp.58-70.

Surakka, S., 2007. What subjects and skills are important for software developers?. Communications of the ACM, 50(1), pp.73-78.

Stevens, D., Totaro, M. and Zhu, Z., 2011. Assessing IT critical skills and revising the MIS curriculum. Journal of Computer Information Systems, 51(3), pp.85-95.

Liebenberg, J., Huisman, M. and Mentz, E., 2014. Knowledge and skills requirements for software developer students. International Journal of Computer and Information Engineering, 8(8), pp.2612-2617.

Patacsil, F.F. and Tablatin, C.L.S., 2017. Exploring the importance of soft and hard skills as perceived by IT internship students and industry: A gap analysis. Journal of Technology and Science education, 7(3), pp.347-368.

Iriarte C and Bayona Orè S 2018 Soft skills for it project success: A systematic literature review.In Trends and Applications in Software Engineering: Proceedings of the 6th InternationalConference on Software Process Improvement (CIMPS 2017) 6 (pp. 147-158). Springer International Publishing.

Skulmoski G J and Hartman F T 2010 Information systems project manager soft competencies:A project-phase investigation. Project Management Journal, 41(1), pp.61-80.

Kartik, N., Mahalakshmi, R. and Venkatesh, K.A., 2023. XAI-Based Student Performance Prediction: Peeling Back the Layers of LSTM and Random Forest’s Black Boxes. SN Computer Science, 4(5), p.699.

Guleria, P. and Sood, M., 2023. Explainable AI and machine learning: performance evaluation and explainability of classifiers on educational data mining inspired career counseling. Education and Information Technologies, 28(1), pp.1081-1116.

Swamy, V., Radmehr, B., Krco, N., Marras, M. and Käser, T., 2022. Evaluating the explainers: black-box explainable machine learning for student success prediction in MOOCs. arXiv preprint arXiv:2207.00551.

Nachouki, M., Mohamed, E.A., Mehdi, R. and Abou Naaj, M., 2023. Student Course Grade Prediction Using the Random Forest Algorithm: Analysis of Predictors' Importance. Trends in Neuroscience and Education, p.100214.

Jayaprakash, S., Krishnan, S. and Jaiganesh, V., 2020, March. Predicting students academic performance using an improved random forest classifier. In 2020 international conference on emerging smart computing and informatics (ESCI) (pp. 238-243). IEEE.

Petkovic D, Sosnick-Pérez M, Huang S, Todtenhoefer R, Okada K, Arora S, et al. 2014 Setap:Software engineering teamwork assessment and prediction using machine learning. In 2014 IEEEfrontiers in education conference (FIE) proceedings (pp. 1-8). IEEE.

Md S and Krishnamoorthy S 2022 Student performance prediction, risk analysis, and feedbackbased on context-bound cognitive skill scores. Education and Information Technologies, 27(3),pp.3981-4005.

Lin H Y and You J 2021 Teamwork-performance prediction by using soft skills and technological savvy skills. Journal of University Teaching & Learning Practice, 18(8), p.09.

Petkovic D, Okada K, Sosnick M, Iyer A, Zhu S, Todtenhoefer R, et al. 2012 Work in progress: a machine learning approach for assessment and prediction of teamwork effectiveness in software engineering education. In 2012 frontiers in education conference proceedings (pp. 1-3). IEEE.

Kumar M, Singh A J and Handa D 2017 Literature survey on student’s performance prediction in education using data mining techniques. International Journal of Education and Management Engineering, 7(6), pp.40-49.

Kolo D K and Adepoju S A 2015 A decision tree approach for predicting students academic performance.

Makhoba L, Jadhav A, Sixhaxa K and Ajoodha R 2022 Evaluation of Student Skill-Sets as Predictors of Success at Higher Education Institutions. In Proceedings of International Conference on Communication and Computational Technologies: ICCCT 2022 (pp. 585-600). Singapore: Springer Nature Singapore.

Cutler A, Cutler D R and Stevens J R 2012 Random forests Ensemble machine learning:Methods and applications, pp.157-175.

Gupta I, Sharma V, Kaur S and Singh A K 2022 PCA-RF: an efficient Parkinson's disease prediction model based on random forest classification. arXiv preprint arXiv:2203.11287.

Saranya, A. and Subhashini, R., 2023. A systematic review of Explainable Artificial Intelligence models and applications: Recent developments and future trends. Decision analytics journal, p.100230.

Van den Broeck, G., Lykov, A., Schleich, M. and Suciu, D., 2022. On the tractability of SHAP explanations. Journal of Artificial Intelligence Research, 74, pp.851-886.

Boateng E Y and Abaye D A 2019 A review of the logistic regression model with emphasis on medical research. Journal of data analysis and information processing, 7(4), pp.190-207.

Jaseena K U and Kovoor B C 2021 A Wavelet-based hybrid multi-step Wind Speed Forecasting model using LSTM and SVR. Wind Engineering, 45(5), pp.1123-1144.

Wali, S. and Khan, I., 2023. Explainable AI and random forest based reliable intrusion detection system. Authorea Preprints.

Makhoba L, Jadhav A, Sixhaxa K and Ajoodha R 2022 Evaluation of Student Skill-Sets as Predictors of Success at Higher Education Institutions. In Proceedings of International Conference on Communication and Computational Technologies: ICCCT 2022 (pp. 585-600). Singapore: Springer Nature Singapore.

Nair A 2019 Parameter tuning with grid search: A hands-on introduction. Analytics India Magazine

Siemers, F.M. and Bajorath, J., 2023. Differences in learning characteristics between support vector machine and random forest models for compound classification revealed by Shapley value analysis. Scientific Reports, 13(1), p.5983.

DataCamp.,2023. Explainable AI: Understanding and trusting machine learning models.

shap., 2018. Census income classification with scikit-learn.




How to Cite

Jasmin Nizar. (2024). A Random Forest Model for Prediction of Software Engineering Skill Set among Computer Science Students through Explainable AI. International Journal of Intelligent Systems and Applications in Engineering, 12(21s), 2633–2650. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/5866



Research Article