Unsupervised Machine Learning Approaches in NLP: A Comparative Study of Topic Modeling with BERTopic and LDA


  • Christian Y. Sy., Lany L. Maceda, Nancy M. Flores, Mideth B. Abisado


unsupervised machine learning, natural language processing (NLP), topic modeling, BERTopic, Latent Dirichlet Allocation (LDA), UAQTE program.


This research aimed to understand the issues and challenges encountered by beneficiaries of the Philippines' Universal Access to Quality Tertiary Education (UAQTE) program, using a comparative analysis of BERTopic and Latent Dirichlet Allocation (LDA) topic modeling techniques. The "Boses Ko" or "My Voice" toolkit was utilized to gather student responses from the ground up. The study found that BERTopic excelled in semantic relevance and coherence, while LDA effectively formed distinct clusters. The evaluation combined automatic metrics, such as silhouette and coherence scores, with domain experts' insights. Key themes identified included "Academic Difficulties," "Financial Difficulties," "Grant Disbursement," "Pandemic-Related Challenges," and "Program Implementation." The research concluded with actionable recommendations for the UAQTE program, advocating for enhanced academic support, improved financial assistance, flexible grant disbursement, strategies to tackle pandemic-related challenges, and establishing a structured feedback mechanism. These suggestions guide policy reforms, encouraging continuous evaluation to ensure long-term effectiveness in the educational sector. Overall, this study provides valuable insights into the application of topic modeling in educational policy analysis and emphasizes the need for nuanced model selection and interpretation for impactful policy development.


Download data is not yet available.


Research Article