Classification Model Based on Supervised Learning in the Constituent Context of the Ayacucho Region, Peru.

Authors

  • Yordan Sullca-Palomino, Yudi Guzmán-Monteza

Keywords:

Supervised learning, Data mining, Constituent process, Annotation Rules, Data mining

Abstract

Using data mining techniques, data from platforms such as Twitter (now called X) represent a valuable opportunity to analyze preferences, specifically in discussing political and social issues. In this study, a text classification model designed to categorize content related to the constituent process in Ayacucho was developed. Using data collected from Twitter, we sought to classify text as 'constituent' or 'non-constituent'. Supervised learning techniques (SVM, RF, and NB) were applied along with three vectorization methods (BOW, TF-IDF, and W2V). An annotation process was established to label classes, ensuring data reliability with a Kappa coefficient 0.72. The data were divided into training, test, and validation sets. Data Augmentation strategies were explored to address data imbalance. Experimental results on the validation dataset revealed that the SVM classification model obtained the highest F1 score, reaching a value of 0.74, outperforming other evaluated models. The findings of this study offer valuable insights for other researchers facing similar challenges in niche-specific text classification. Both the annotation methodology employed and the effectiveness of the classification techniques, together with an approach focused on continuous improvement, lay a solid foundation for future projects in this field.

Downloads

Download data is not yet available.

References

E. J. Zechmeister y N. Lupu, “El Barómetro de las Américas 2018/19”. 2019.

S. L. Shaw, M. H. Tsou, y X. Ye, “Editorial: human dynamics in the mobile and big data era”, International Journal of Geographical Information Science, vol. 30, núm. 9, pp. 1687–1693, sep. 2016, doi: 10.1080/13658816.2016.1164317.

“Tolerancia a los ‘golpes de Estado’ ejecutivos en Perú - Red de Desarrollo Social de América Latina y el Caribe (ReDeSoc).” Accessed: Mar. 11, 2024. [Online]. Available: https://dds.cepal.org/redesoc/portal/publicaciones/ficha/?id=5081

ONPE, “Pasos para llegar al referéndum nacional 2018,” 2018.

H. Schoen, D. Gayo-Avello, P. Takis Metaxas, E. Mustafaraj, M. Strohmaier, and P. Gloor, “The power of prediction with social media,” Internet Research, vol. 23, no. 5, pp. 528–543, Oct. 2013, doi: 10.1108/IntR-06-2013-0115.

A. Jungherr, “Twitter Use in Election Campaigns: A Systematic Literature Review,” Journal of Information Technology & Politics, vol. 13, pp. 72–91, Mar. 2016, doi: 10.1080/19331681.2015.1132401.

R. E. Proaño Arias, “Aplicación de la minería de datos para análisis de tendencias políticas en redes sociales,” masterThesis, Quito, 2019. Accessed: Mar. 11, 2024. [Online]. Available: http://repositorio.uisrael.edu.ec/handle/47000/2023

E. León Pluas, E. Proaño Arias, V. Muirragui Irrazábal, y J. Cajamarca Yunga, «Minería de datos en el análisis de tendencias políticas en redes sociales», CD, vol. 3, n.º 3.4., pp. 91-103, sep. 2019.

N. Roales González, “Detección de tendencias en twitter utilizando minería de datos adaptativa,” 2014.

[1] D. V. Calvo, M. A. A. Pardo, and C. G. Rodrıguez, “Análisis de contenidos en Twitter: clasificación de mensajes e identificación de la tendencia política de los usuarios,” Jun. 2014.

M. E. Gordon Pico, “Desarrollo de una herramienta de minería de datos para el análisis de influencia de cuentas automatizadas en temas de tendencia sobre la opinión de los usuarios de twitter en Ecuador,” Jun. 2018.

Z. Zhang, X. Lin, and S. Shan, “Big data-assisted urban governance: An intelligent real-time monitoring and early warning system for public opinion in government hotline,” Future Generation Computer Systems, vol. 144, pp. 90–104, Jul. 2023, doi: 10.1016/j.future.2023.03.004.

J. Urpay-Camasi, J. Garcia-Calderon, and P. Shiguihara, “A Method to Construct Guidelines for Spanish Comments Annotation for Sentiment Analysis,” in 2021 IEEE Sciences and Humanities International Research Conference (SHIRCON), 2021, pp. 1–4. doi: 10.1109/SHIRCON53068.2021.9652313.

Y. Guzmán-Monteza, “Assessment of an annotation method for the detection of Spanish argumentative, non-argumentative, and their components,” Telematics and Informatics Reports, vol. 11, p. 100068, 2023, doi: https://doi.org/10.1016/j.teler.2023.100068.

M. S. Raja and L. A. Raj, “Fake news detection on social networks using Machine learning techniques,” Materials Today: Proceedings, vol. 62, pp. 4821–4827, 2022, doi: https://doi.org/10.1016/j.matpr.2022.03.351.

L. Liu, A. Guevara, and J. E. Sanchez-Galan, “Identification and classification of road traffic incidents in Panama City through the analysis of a social media stream and machine learning,” Intelligent Systems with Applications, vol. 16, p. 200158, 2022, doi: https://doi.org/10.1016/j.iswa.2022.200158.

S. Ozcan, M. Suloglu, C. O. Sakar, and S. Chatufale, “Social media mining for ideation: Identification of sustainable solutions and opinions,” Technovation, vol. 107, p. 102322, Sep. 2021, doi: 10.1016/j.technovation.2021.102322.

Y. Guzman, A. Tavara, R. Zevallos, and H. Vega, “Implementation of a Bilingual Participative Argumentation Web Platform for collection of Spanish Text and Quechua Speech,” in 2021 International Conference on Electrical, Communication, and Computer Engineering (ICECCE), 2021, pp. 1–6. doi: 10.1109/ICECCE52056.2021.9514251.

B. Onikoyi, N. Nnamoko, and I. Korkontzelos, “Gender prediction with descriptive textual data using a Machine Learning approach,” Natural Language Processing Journal, vol. 4, p. 100018, 2023, doi: https://doi.org/10.1016/j.nlp.2023.100018.

M. Cardaioli, P. Kaliyar, P. Capuozzo, M. Conti, G. Sartori, and M. Monaro, “Predicting Twitter Users’ Political Orientation: An Application to the Italian Political Scenario,” in 2020 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), 2020, pp. 159–165. doi: 10.1109/ASONAM49781.2020.9381470.

A. Alshehri, W. Isaacs, A. Addawood, M. Trotz, and S. Chellappan, “Predicting Community Engagement on Twitter on Environmental Health Hazards,” in 2019 IEEE 13th International Conference on Semantic Computing (ICSC), Los Alamitos, CA, USA: IEEE Computer Society, Feb. 2019, pp. 450–455. doi: 10.1109/ICOSC.2019.8665530.

S. M. Alzanin, A. M. Azmi, and H. A. Aboalsamh, “Short text classification for Arabic social media tweets,” Journal of King Saud University - Computer and Information Sciences, vol. 34, no. 9, pp. 6595–6604, 2022, doi: https://doi.org/10.1016/j.jksuci.2022.03.020.

K. A. Qureshi and M. Sabih, “Un-Compromised Credibility: Social Media Based Multi-Class Hate Speech Classification for Text,” IEEE Access, vol. 9, pp. 109465–109477, 2021, doi: 10.1109/ACCESS.2021.3101977.

N. Reynaldo, Goenawan, W. Chanrico, D. Suhartono, and F. Purnomo, “Gender Demography Classification on Instagram based on User’s Comments Section,” Procedia Computer Science, vol. 157, pp. 64–71, 2019, doi: https://doi.org/10.1016/j.procs.2019.08.142.

L. Li, Z. Ma, H. Lee, and S. Lee, “Can social media data be used to evaluate the risk of human interactions during the COVID-19 pandemic?,” International Journal of Disaster Risk Reduction, vol. 56, p. 102142, 2021, doi: https://doi.org/10.1016/j.ijdrr.2021.102142.

Downloads

Published

26.03.2024

How to Cite

Yordan Sullca-Palomino. (2024). Classification Model Based on Supervised Learning in the Constituent Context of the Ayacucho Region, Peru. International Journal of Intelligent Systems and Applications in Engineering, 12(21s), 2173–2185. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/5814

Issue

Section

Research Article