A Pre-trained Transformer-based Ensemble Model for Automated Indonesian Fake News Classification
Keywords:
Deep Learning, Fake News, Natural Language Processing, Text Mining, Transformer ModelAbstract
Fake news often aims to damage the reputation of a person or entity, or to generate personal gain. The lack of a scalable fake news classification strategy is particularly worrying. Since manually classifying fake news is a time-consuming task, automatic identification of fake news has attracted a lot of attention in the Natural Language Processing (NLP) community to help ease the activity of classifying fake news. In recent Indonesian language news dataset, existing machine learning algorithms such as KNN and Naïve Bayes are used in this task, however it suffers from the lack of the ability to capture the true (semantic) meaning of words; therefore, the context is slightly lost. To address limitations, this paper introduces a new prediction using ensemble transformer based deep learning pre-trained language model such as BERT, RoBERTa, and DistilBERT as features extraction method on social media data sources. Finally, the system takes the decision based on model averaging to make prediction. Our proposed work yields promising performance as it has outperformed similar existing works in the literature. More precisely, our results achieve a maximum accuracy of 0.887 and f1 measure score of 0.878 on the news dataset.
Downloads
References
Steele, J. (2021, June 23). Indonesia Digital News Report 2021. Reuters Institute for the Study of Journalism. Retrieved August 13, 2022, from https://reutersinstitute.politics.ox.ac.uk/digital-news-report/2021/indonesia
Reis, J. C., Correia, A., Murai, F., Veloso, A., & Benevenuto, F. (2019). Supervised learning for fake news detection. IEEE Intelligent Systems, 34(2), 76–81. https://doi.org/10.1109/mis.2019.2899143
Ribeiro, F., Henrique, L., Benevenuto, F., Chakraborty, A., Kulshrestha, J., Babaei, M., & Gummadi, K. (2018). Media Bias Monitor: Quantifying Biases of Social Media News Outlets at Large-Scale. Proceedings of the International AAAI Conference on Web and Social Media, 12(1).
Gelfert, A. (2018). Fake news: A definition. Informal Logic, 38(1), 84–117. https://doi.org/10.22329/il.v38i1.5068
Ruchansky, N., Seo, S., & Liu, Y. (2017). CSI: A Hybrid Deep Model for Fake News Detection. Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. https://doi.org/10.1145/3132847.3132877
Umer, M., Imtiaz, Z., Ullah, S., Mehmood, A., Choi, G. S., & On, B.-W. (2020). Fake news stance detection using Deep Learning Architecture (CNN-LSTM). IEEE Access, 8, 156695–156706. https://doi.org/10.1109/access.2020.3019735
Pratiwi, I. Y., Asmara, R. A., & Rahutomo, F. (2017). Study of hoax news detection using naïve Bayes classifier in Indonesian language. 2017 11th International Conference on Information & Communication Technology and System (ICTS). https://doi.org/10.1109/icts.2017.8265649
Pérez-Rosas, V., Kleinberg, B., Lefevre, A., & Mihalcea, R. (2018). Automatic Detection of Fake News. Proceedings of the 27th International Conference on Computational Linguistics, 3391–3401.
Kennedy, S., Walsh, N., Sloka, K., McCarren, A., & Foster, J. (2019). Fact or factitious? contextualized opinion spam detection. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop. https://doi.org/10.18653/v1/p19-2048
Rubin, V., Conroy, N., Chen, Y., & Cornwell, S. (2016). Fake news or truth? using satirical cues to detect potentially misleading news. Proceedings of the Second Workshop on Computational Approaches to Deception Detection. https://doi.org/10.18653/v1/w16-0802
Karimi, H., Roy, P., Saba-Sadiya, S., & Tang, J. (2018). Multi-Source Multi-Class Fake News Detection. Proceedings of the 27th International Conference on Computational Linguistics, 1546–1557.
Tachhini, E., Ballarin, G., Della Vedova, M. L., Moret, S., & de Alfaro, L. (2017). Some Like it Hoax: Automated Fake News Detection in Social Networks. Proceedings of the Second Workshop on Data Science for Social Good (SoGood), 1960.
Zuliarso, E., Anwar, M. T., Hadiono, K., & Chasanah, I. (2020). Detecting hoaxes in Indonesian news using TF/TDM and K nearest neighbor. IOP Conference Series: Materials Science and Engineering, 835(1), 012036. https://doi.org/10.1088/1757-899x/835/1/012036
Chen, B., Chen, B., Gao, D., Chen, Q., Huo, C., Meng, X., Ren, W., & Zhou, Y. (2021). Transformer-based language model fine-tuning methods for COVID-19 fake news detection. Combating Online Hostile Posts in Regional Languages during Emergency Situation, 83–92. https://doi.org/10.1007/978-3-030-73696-5_9
Kaliyar, R. K., Goswami, A., & Narang, P. (2021). Fakebert: Fake news detection in social media with a Bert-based deep learning approach. Multimedia Tools and Applications, 80(8), 11765–11788. https://doi.org/10.1007/s11042-020-10183-2
Wei, J., & Zou, K. (2019). Eda: Easy data augmentation techniques for boosting performance on text classification tasks. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). https://doi.org/10.18653/v1/d19-1670
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. CoRR, abs/1810.04805. https://doi.org/https://doi.org/10.48550/arXiv.1810.04805
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., & Stoyanov, V. (n.d.). RoBERTa: A Robustly Optimized BERT Pretraining Approach, abs/1907.11692.
Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. CoRR, abs/1910.01108. https://doi.org/10.48550/ARXIV.1910.01108
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention Is All You Need. CoRR, abs/1706.03762.
Ju, C., Bibaut, A., & van der Laan, M. (2018). The relative performance of ensemble methods with deep convolutional neural networks for Image Classification. Journal of Applied Statistics, 45(15), 2800–2818. https://doi.org/10.1080/02664763.2018.1441383
Lynn, V., Balasubramanian, N., & Schwartz, H. A. (2020). Hierarchical modeling for user personality prediction: The role of message-level attention. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.acl-main.472
Christian, H., Suhartono, D., Chowanda, A., & Zamli, K. Z. (2021). Text based personality prediction from multiple social media data sources using pre-trained language model and model averaging. Journal of Big Data, 8(1). https://doi.org/10.1186/s40537-021-00459-1
Mrs. Leena Rathi. (2014). Ancient Vedic Multiplication Based Optimized High Speed Arithmetic Logic . International Journal of New Practices in Management and Engineering, 3(03), 01 - 06. Retrieved from http://ijnpme.org/index.php/IJNPME/article/view/29
Ramasamy, J. ., Doshi, R. ., & Hiran, K. K. . (2023). Three Step Authentication of Brain Tumour Segmentation Using Hybrid Active Contour Model and Discrete Wavelet Transform. International Journal on Recent and Innovation Trends in Computing and Communication, 11(3s), 56–64. https://doi.org/10.17762/ijritcc.v11i3s.6155
Prema, K. ., & J, V. . (2023). A Novel Marine Predators Optimization based Deep Neural Network for Quality and Shelf-Life Prediction of Shrimp. International Journal on Recent and Innovation Trends in Computing and Communication, 11(3s), 65–72. https://doi.org/10.17762/ijritcc.v11i3s.6156
Mr. Vaishali Sarangpure. (2014). CUP and DISC OPTIC Segmentation Using Optimized Superpixel Classification for Glaucoma Screening. International Journal of New Practices in Management and Engineering, 3(03), 07 - 11. Retrieved from http://ijnpme.org/index.php/IJNPME/article/view/30
Downloads
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.