Unsupervised Machine Learning Approaches in NLP: A Comparative Study of Topic Modeling with BERTopic and LDA

Authors

  • Christian Y. Sy., Lany L. Maceda, Nancy M. Flores, Mideth B. Abisado

Keywords:

unsupervised machine learning, natural language processing (NLP), topic modeling, BERTopic, Latent Dirichlet Allocation (LDA), UAQTE program.

Abstract

This research aimed to understand the issues and challenges encountered by beneficiaries of the Philippines' Universal Access to Quality Tertiary Education (UAQTE) program, using a comparative analysis of BERTopic and Latent Dirichlet Allocation (LDA) topic modeling techniques. The "Boses Ko" or "My Voice" toolkit was utilized to gather student responses from the ground up. The study found that BERTopic excelled in semantic relevance and coherence, while LDA effectively formed distinct clusters. The evaluation combined automatic metrics, such as silhouette and coherence scores, with domain experts' insights. Key themes identified included "Academic Difficulties," "Financial Difficulties," "Grant Disbursement," "Pandemic-Related Challenges," and "Program Implementation." The research concluded with actionable recommendations for the UAQTE program, advocating for enhanced academic support, improved financial assistance, flexible grant disbursement, strategies to tackle pandemic-related challenges, and establishing a structured feedback mechanism. These suggestions guide policy reforms, encouraging continuous evaluation to ensure long-term effectiveness in the educational sector. Overall, this study provides valuable insights into the application of topic modeling in educational policy analysis and emphasizes the need for nuanced model selection and interpretation for impactful policy development.

Downloads

Download data is not yet available.

References

P. G. Altbach, L. Reisberg, and L. E. Rumbley, "Trends in Global Higher Education: Tracking an Academic Revolution A Report Prepared for the UNESCO 2009 World Conference on Higher Education Published with support from SIDA/SAREC," 2009.

L. Mishra, T. Gupta, and A. Shree, "Online teaching-learning in higher education during lockdown period of COVID-19 pandemic," International Journal of Educational Research Open, vol. 1, Jan. 2020, doi: 10.1016/j.ijedro.2020.100012.

V. Erdoğan, "Integrating 4C Skills of 21st Century into 4 Language Skills in EFL Classes," International Journal of Education and Research, 2019, [Online]. Available: www.ijern.com

A. S. R. Manstead, "The psychology of social class: How socioeconomic status impacts thought, feelings, and behaviour," British Journal of Social Psychology, vol. 57, no. 2, pp. 267–291, Apr. 2018, doi: 10.1111/bjso.12251.

T. Kromydas, "Rethinking higher education and its relationship with social inequalities: Past knowledge, present state, and future potential," Palgrave Commun, vol. 3, no. 1, Dec. 2017, doi: 10.1057/s41599-017.

T. Kestin, van den Belt, L. Denby, K., T. Ross, and M. Hawkes, "Getting started with the SDGs in universities: A guide for universities, higher education institutions, and the academic sector.," Andrew Wilks, 2017.

G. Nhamo and V. Mjimba, "Sustainable Development Goals Series Quality Education Sustainable Development Goals and Institutions of Higher Education," 2020.

V. Vaccari and M. P. Gardinier, "Toward one world or many? A comparative analysis of OECD and UNESCO global education policy documents," International Journal of Development Education and Global Learning, vol. 11, no. 1, Jun. 2019, doi: 10.18546/ijdegl.11.1.05.

R. J. Didham and P. Ofei-Manu, "Adaptive capacity as an educational goal to advance policy for integrating DRR into quality education for sustainable development," International Journal of Disaster Risk Reduction, vol. 47, Aug. 2020, doi: 10.1016/j.ijdrr.2020.101631.

V. Odell, P. Molthan-Hill, S. Martin, and S. Sterling, "Transformative Education to Address All Sustainable Development Goals," 2020, pp. 1–12. doi: 10.1007/978-3-319-69902-8_106-1.

K. Kohl et al., "A whole-institution approach towards sustainability: a crucial aspect of higher education's individual and collective engagement with the SDGs and beyond," International Journal of Sustainability in Higher Education, vol. 23, no. 2. Emerald Group Holdings Ltd., pp. 218–236, February 21, 2022.

D. G. Smith, "Diversity's Promise for Higher Education: Making It Work," JHU Press, 2020.

T. Shaik et al., "A Review of the Trends and Challenges in Adopting Natural Language Processing Methods for Education Feedback Analysis," IEEE Access, vol. 10, pp. 56720–56739, 2022, doi: 10.1109/ACCESS.2022.3177752.

O. Umidjon, "Unlocking the Power of Natural Language Processing (NLP) for Text Analysis," World scientific research journal 17, no. 1, 2023.

K. R. Prasad, M. Mohammed, and R. M. Noorullah, "Hybrid topic cluster models for social healthcare data," International Journal of Advanced Computer Science and Applications, vol. 10, no. 11, pp. 490–506, 2019, doi: 10.14569/IJACSA.2019.0101168.

S. Likhitha, B. S., and H. M., "A Detailed Survey on Topic Modeling for Document and Short Text Data," Int J Comput Appl, vol. 178, no. 39, pp. 1–9, Aug. 2019, doi: 10.5120/ijca2019919265.

B. A. H. Murshed, S. Mallappa, J. Abawajy, M. A. N. Saif, H. D. E. Al-ariki, and H. M. Abdulwahab, "Short text topic modeling approaches in the context of big data: taxonomy, survey, and analysis," Artif Intell Rev, vol. 56, no. 6, pp. 5133–5260, Jun. 2023, doi: 10.1007/s10462-022-10254-w.

I. Vayansky and S. A. P. Kumar, "A review of topic modeling methods," Inf Syst, vol. 94, Dec. 2020, doi: 10.1016/j.is.2020.101582.

W. Luis Roldan-Baluis, N. Alcas Zapata, and M. Soledad Mañaccasa Vásquez, "The Effect of Natural Language Processing on the Analysis of Unstructured Text: A Systematic Review," International Journal of Advanced Computer Science and Applications, 13(5), 2022.

M. A. K. Raiaan, M. S. H. Mukta, K. Fatema, N. M. Fahad, S. Sakib, and Most. M. J. Mim, "A Review on Large Language Models: Architectures, Applications, Taxonomies, Open Issues and Challenges," IEEE access, 2023.

E. Qais, V. Mn, and P. E. S. Mca, "Short Text Analytics based on BERT by using Multivariate Filter Methods for Feature Selection," 2023, doi: 10.21203/rs.3.rs-3336617/v1.

B. V Pranay Kumar and M. Sadanandam, "A Fusion Architecture of BERT and RoBERTa for Enhanced Performance of Sentiment Analysis of Social Media Platforms," International Journal of Computing and Digital Systems, 15(1), 51-66, 2023.

F. Alhaj, A. Al-Haj, A. Sharieh, and R. Jabri, "Improving Arabic Cognitive Distortion Classification in Twitter using BERTopic," International Journal of Advanced Computer Science and Applications, 13(1), 854-860, 2022.

D. Maier et al., "LDA Topic Modeling in Communication Research Applying LDA topic modeling in communication research: Toward a valid and reliable methodology Communication Methods and Measures: Special Issue on Computational Methods," 2021.

Zoya, S. Latif, F. Shafait, and R. Latif, “Analyzing LDA and NMF Topic Models for Urdu Tweets via Automatic Labeling,” IEEE Access, vol. 9, pp. 127531–127547, 2021, doi: 10.1109/ACCESS.2021.3112620.

D. Choi and B. Song, "Exploring technological trends in logistics: Topic modeling-based patent analysis," Sustainability (Switzerland), vol. 10, no. 8, Aug. 2018, doi: 10.3390/su10082810.

T. Wang and C. Y. Liu, "JSEA: A Program Comprehension Tool Adopting LDA-based Topic Modeling," International Journal of Advanced Computer Science and Applications, 8(3), 2017. [Online]. Available: https://github.com/jseaTool/JSEA

R. Surbakti Saragih, S. Subagio, R. Aditya, and R. Watrianthos, "Jurnal Media Informatika Budidarma BERTopic Modeling of Natural Language Processing Abstracts: Thematic Structure and Trajectory," 2023, doi: 10.30865/mib.v7i3.6426.

W. Zha, Q. Ye, J. Li, and K. Ozbay, "A social media Data-Driven analysis for transport policy response to the COVID-19 pandemic outbreak in Wuhan, China," Transp Res Part A Policy Pract, vol. 172, Jun. 2023, doi: 10.1016/j.tra.2023.103669.

J. A. Da Silva Amaral and F. B. De Lima Neto, "A Model for Selecting Relevant Topics in Documents Aimed at Compliance Processes," in 2021 IEEE Latin American Conference on Computational Intelligence, LA-CCI 2021, Institute of Electrical and Electronics Engineers Inc., 2021. doi: 10.1109/LA-CCI48322.2021.9769786.

C. Cheng and B. Morkos, "Exploring topic modeling for generalising design requirements in complex design," Journal of Engineering Design, 34(11), 922-940, 2023.

T. Saheb and M. Dehghani, "Artificial intelligence for Sustainability in Energy Industry: A Contextual Topic Modeling and Content Analys," Sustainable Computing: Informatics and Systems, 35, 100699, 2022.

M. Grootendorst, "BERTopic: Neural topic modeling with a class-based TF-IDF procedure," Mar. 2022, [Online]. Available: http://arxiv.org/abs/2203.05794

S. H. Mohammed and S. Al-Augby, "LSA & LDA Topic Modeling Classification: Comparison study on E-books," Indonesian Journal of Electrical Engineering and Computer Science, vol. 19, no. 1, 2020, doi: 10.11591/ijeecs.v19.i1.pp%25p.

X. Chen, D. Zou, G. Cheng, and H. Xie, "Detecting latent topics and trends in educational technologies over four decades using structural topic modeling: A retrospective of all volumes of Computers & Education," Comput Educ, vol. 151, Jul. 2020, doi: 10.1016/j.compedu.2020.103855.

A. Analytics, D. Fontes, and H. Silvestre Da Silva, “MMAA Mestrado em Métodos Analíticos Avançados MAPINTEL: Enhancing Competitive Intelligence Acquisition Through Embedding and Visual Analytics,” 2021.

S. Sarkar, A. Alhamadani, L. Alkulaib, and C. T. Lu, "Predicting Depression and Anxiety on Reddit: a Multi-task Learning Approach," in Proceedings of the 2022 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2022, Institute of Electrical and Electronics Engineers Inc., 2022, pp. 427–435. doi: 10.1109/ASONAM55673.2022.10068655.

W. Qi, "Beyond Sentiment: Leveraging Topic Metrics for Political Stance Classification," arXiv preprint arXiv:2310.15429, Oct. 2023.

E. Aytaç and M. Khayet, "A Topic Modeling Approach to Discover the Global and Local Subjects in Membrane Distillation Separation Process," Separations, vol. 10, no. 9, p. 482, Sep. 2023, doi: 10.3390/separations10090482.

M. Rujas, B. Merino-Barbancho, P. Arroyo, and G. Fico, "Development of a Natural Language Processing-Based System for Characterizing Eating Disorders," 2023.

M. H. Weng, S. Wu, and M. Dyer, "Identification and Visualization of Key Topics in Scientific Publications with Transformer-Based Language Models and Document Clustering Methods," Applied Sciences (Switzerland), vol. 12, no. 21, Nov. 2022, doi: 10.3390/app122111220.

J. Yang, H. Jang, and K. Yu, “Analyzing Geographic Questions Using Embedding-based Topic Modeling,” ISPRS Int J Geoinf, vol. 12, no. 2, Feb. 2023, doi: 10.3390/ijgi12020052.

Downloads

Published

17.05.2024

How to Cite

Christian Y. Sy. (2024). Unsupervised Machine Learning Approaches in NLP: A Comparative Study of Topic Modeling with BERTopic and LDA. International Journal of Intelligent Systems and Applications in Engineering, 12(21s), 3276 –. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/6018

Issue

Section

Research Article