Ensemble Based Low Complexity Arabic Fake News Detection
Keywords:
Machine learning, Deep learning, Natural Language Processing, Fake News, Arabic Fake Detection (AFD), Ensemble Learning (EL)Abstract
Nowadays, Due to the growth of online communication, many people use social media platforms and produce a lot of content. Fake news creates negative perceptions of society. The rise of social media networks has given fake news a platform to rapidly gain popularity among users. Identifying and labelling Arabic fake news represents a big challenge because of the large amount of heterogeneous content in addition to the limited related Arabic datasets available. machine learning (ML), Natural language processing (NLP) and deep learning (DL) are commonly used to increase the speed and automate the analytical process of this huge amount of content and to transform unstructured text into structured form. In this study, a corpus of news websites is developed to determine fake news using some machine learning techniques; this includes a dataset 3185 fake news articles and a new dataset consists of 1453 real news articles. This paper shows that, using an aggregation of machine learning and ensemble methods, we can make a prediction model for fake news that has an accuracy of up to 100%. with low complexity, which can save power and energy, and we can use it as a reference for the detection of fake news in Egypt and Arab countries.
Downloads
References
De Sarkar, S., Yang, F., & Mukherjee, Attending sentences to detect satirical fake news. In Proceedings of the 27th international conference on computational linguistics (pp. 3371-3380), A. (2018, August).
Zitouni, I., Abdul-Mageed, M., Bouamor, H., Bougares, F., El-Haj, M., Tomeh, N., & Zaghouani, W.Proceedings of the Fifth Arabic Natural Language Processing Workshop. In Proceedings of the Fifth Arabic Natural Language Processing Workshop. (2020, December).
Alonso García, S., Gómez García, G., Sanz Prieto, M., Moreno Guerrero, A. J., & Rodríguez Jiménez, C. The impact of term fake news on the scientific community. Scientific performance and mapping in web of science. Social Sciences, 9(5), 73. (2020).
Roberts, J. J. Hoax over ‘dead’Ethereum founder spurs $4 billion wipe out. Fortune. (2017).
Bastos, M. T., & Mercea, D. The Brexit botnet and user-generated hyperpartisan news. Social science computer review, 37(1), 38-54. (2019).
Silverman, C., & Alexander, L. How teens in the Balkans are duping Trump supporters with fake news. BuzzFeed, 3 November. (2016).
World Health Organization. Fighting misinformation in the time of COVID-19, one click at a time. World Health Organization. Retrieved January, 6. (2021).
Gray, R. Lies, propaganda and fake news: A challenge for our age. BBC News, 1. (2017).
Alkhair, M., Meftouh, K., Smaïli, K., & Othman, N. An arabic corpus of fake news: Collection, analysis and classification. In Arabic Language Processing: From Theory to Practice: 7th International Conference, ICALP 2019, Nancy, France, October 16–17, 2019, Proceedings 7 (pp. 292-302). Springer International Publishing. (2019).
SUN, Y. P., WANG, X. J., WANG, X. W., JIANG, S. W., & LIU, Y. B. Ensemble similarity measure for community-based question answer. The Journal of China Universities of Posts and Telecommunications, 21(1), 116-121. (2014).
Zhang, Y., Liu, J., & Shen, W. A review of ensemble learning algorithms used in remote sensing applications. Applied Sciences, 12(17), 8654. (2022).
Ahmad, I., Yousaf, M., Yousaf, S., & Ahmad, M. O. Fake news detection using machine learning ensemble methods. Complexity, 2020, 1-11. (2020).
Johnson, W., & Bouchard Jr, T. J. Sex differences in mental abilities: g masks the dimensions on which they lie. Intelligence, 35(1), 23-39. (2007).
Dey, A., Rafi, R. Z., Parash, S. H., Arko, S. K., & Chakrabarty, A. Fake news pattern recognition using linguistic analysis. In 2018 Joint 7th International Conference on Informatics, Electronics & Vision (ICIEV) and 2018 2nd International Conference on Imaging, Vision & Pattern Recognition (icIVPR) (pp. 305-309). IEEE. (2018, June).
Bondielli, A., & Marcelloni, F. A survey on fake news and rumour detection techniques. Information Sciences, 497, 38-55. (2019).
Marquardt, D. (2019). Linguistic indicators in the identification of fake news. Mediatization Studies, (3).
Torabi Asr, F., & Taboada, M. Big Data and quality data for fake news and misinformation detection. Big Data & Society, 6(1), 2053951719843310. (2019).
Hancock, J. T., Curry, L. E., Goorha, S., & Woodworth, M. On lying and being lied to: A linguistic analysis of deception in computer-mediated communication. Discourse Processes, 45(1), 1-23. (2007).
Rashkin, H., Choi, E., Jang, J. Y., Volkova, S., & Choi, Y. Truth of varying shades: Analyzing language in fake news and political fact-checking. In Proceedings of the 2017 conference on empirical methods in natural language processing (pp. 2931-2937). (2017, September).
Clarke, I., & Grieve, J. Stylistic variation on the Donald Trump Twitter account: A linguistic analysis of tweets posted between 2009 and 2018. PloS one, 14(9), e0222062. (2019).
Ahmed, A. F., Mohamed, R., & Mostafa, B. Machine learning for authorship attribution in Arabic poetry. Int. J. Future Comput. Commun, 6(2), 42-46. (2017).
Baraka, R. S., Salem, S., Hussien, M. A., Nayef, N., & Shaban, W. A. Arabic text author identification using support vector machines. Journal of Advanced Computer Science and Technology Research, 4(1), 1-11. (2014).
Pérez-Rosas, V., Kleinberg, B., Lefevre, A., & Mihalcea, R. Automatic detection of fake news. arXiv preprint arXiv:1708.07104. (2017).
Ahmed, H., Traore, I., & Saad, S. Detecting opinion spams and fake news using text classification. Security and Privacy, 1(1), e9. (2018).
Uppal, A. ., Naruka, M. S. ., & Tewari, G. . (2023). Image Processing based Plant Disease Detection and Classification . International Journal on Recent and Innovation Trends in Computing and Communication, 11(1s), 52–56. https://doi.org/10.17762/ijritcc.v11i1s.5993
Reis, J. C., Correia, A., Murai, F., Veloso, A., & Benevenuto, F. Supervised learning for fake news detection. IEEE Intelligent Systems, 34(2), 76-81. (2019).
Asghar, M. Z., Habib, A., Habib, A., Khan, A., Ali, R., & Khattak, A. Exploring deep neural networks for rumor detection. Journal of Ambient Intelligence and Humanized Computing, 12, 4315-4333. (2021).
Goldani, M. H., Safabakhsh, R., & Momtazi, S. (2021). Convolutional neural network with margin loss for fake news detection. Information Processing & Management, 58(1), 102418.
Hakak, S., Alazab, M., Khan, S., Gadekallu, T. R., Maddikunta, P. K. R., & Khan, W. Z. An ensemble machine learning approach through effective feature extraction to classify fake news. Future Generation Computer Systems, 117, 47-58. (2021).
Umer, M., Imtiaz, Z., Ullah, S., Mehmood, A., Choi, G. S., & On, B. W. (2020). Fake news stance detection using deep learning architecture (CNN-LSTM). IEEE Access, 8, 156695-156706.
Nasir, J. A., Khan, O. S., & Varlamis, I. Fake news detection: A hybrid CNN-RNN based deep learning approach. International Journal of Information Management Data Insights, 1(1), 100007. (2021).
Golbeck, J., Mauriello, M., Auxier, B., Bhanushali, K. H., Bonk, C., Bouzaghrane, M. A., ... & Visnansky, G. (2018, May). Fake news vs satire: A dataset and analysis. In Proceedings of the 10th ACM Conference on Web Science (pp. 17-21).
Zhou, X., Mulay, A., Ferrara, E., & Zafarani, R. Recovery: A multimodal repository for covid-19 news credibility research. In Proceedings of the 29th ACM international conference on information & knowledge management (pp. 3205-3212). (2020, October).
Saadany, H., Mohamed, E., & Orasan, C. (2020). Fake or real? A study of Arabic satirical fake news. arXiv preprint arXiv:2011.00452.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. Advances in neural information processing systems, 26.
Zalmout, N., & Habash, N. (2017). Optimizing tokenization choice for machine translation across multiple target languages. The Prague Bulletin of Mathematical Linguistics, 108(1), 257.
Elhassan, R., & Ahmed, M. Arabic text classification review. evaluation, 12, 13. (2015).
Stemming and lemmatization,” Available: https://nlp.stanford.edu/IR-book/html/htmledition/stemmingand lemmatization-1.html. [Accessed 15.11.2017].
“nltk.stem.isri.ISRIStemmer”,available:https://programtalk.com/pythonexamples/nltk.stem.isri.ISRIStemmer. [Accessed 3.2.2018].
Goldberg, Y. (2017). Neural network methods for natural language processing. Synthesis lectures on human language technologies, 10(1), 1-309.
Tian, L., & Zhang, C. Using Hashtags to Analyze Purpose and Technology Application of Open-Source Project Related to COVID-19. arXiv preprint arXiv:2207.06219. (2022).
Mikolov, T., Chen, K., Corrado, G., & Dean, J. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781. (2013).
Mitchell, T. M. The discipline of machine learning (Vol. 9). Pittsburgh: Carnegie Mellon University, School of Computer Science, Machine Learning Department. (2006).
Granik, M., & Mesyura, V. Fake News Detection using Naïve Bayes Classifier. IEEE First Ukraine Conference on Electrical and Computer Engineering (UKRCON). Kiev, Ukraine, (2017).
Kecman, V. Support vector machines–an introduction. In Support vector machines: theory and applications (pp. 1-47). Berlin, Heidelberg: Springer Berlin Heidelberg. (2005).
Lima, T. P. F., Sena, G. R., Neves, C. S., Vidal, S. A., Lima, J. T. O., Mello, M. J. G., & Silva, F. A. D. O. L. D. F. Death risk and the importance of clinical features in elderly people with COVID-19 using the Random Forest Algorithm. Revista Brasileira de Saúde Materno Infantil, 21, 445-451. (2021).
Charbuty, B., & Abdulazeez, A. Classification based on decision tree algorithm for machine learning. Journal of Applied Science and Technology Trends, 2(01), 20-28. (2021).
Dhabliya, D. (2021). Feature Selection Intrusion Detection System for The Attack Classification with Data Summarization. Machine Learning Applications in Engineering Education and Management, 1(1), 20–25. Retrieved from http://yashikajournals.com/index.php/mlaeem/article/view/8
Damanik, I. S., Windarto, A. P., Wanto, A., Andani, S. R., & Saputra, W. Decision tree optimization in C4. 5 algorithm using genetic algorithm. In Journal of Physics: Conference Series (Vol. 1255, No. 1, p. 012012). IOP Publishing. (2019, August).
Gavankar, S. S., & Sawarkar, S. D. Eager decision tree. In 2017 2nd International Conference for Convergence in Technology (I2CT) (pp. 837-840). IEEE. (2017, April).
Akhtar, S., Hussain, F., Raja, F. R., Ehatisham-ul-haq, M., Baloch, N. K., Ishmanov, F., & Zikria, Y. B. Improving mispronunciation detection of arabic words for non-native learners using deep convolutional neural network features. Electronics, 9(6), 963. (2020).
Asghar, M. Z., Habib, A., Habib, A., Khan, A., Ali, R., & Khattak, A. Exploring deep neural networks for rumor detection. Journal of Ambient Intelligence and Humanized Computing, 12, 4315-4333. (2021).
Anantharaman, K., Angel, S., Sivanaiah, R., Madhavan, S., & Rajendram, S. M. SSN_MLRG1@ LT-EDI-ACL2022: Multi-Class Classification using BERT models for Detecting Depression Signs from social media Text. In Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion (pp. 296-300). (2022, May).
Devlin, J., & Chang, M. W. Research Scientists, Google AI Language: Open Sourcing BERT: State-of-the-Art Pre-training for Natural Language Processing (англ.). Google. (2018).
Breiman, L. Bagging predictors. Machine learning, 24, 123-140. (1996).
Schapire, R. E. The boosting approach to machine learning: An overview. Nonlinear estimation and classification, 149-171. (2003).
Erdebilli, B., & Devrim-İçtenbaş, B. Ensemble Voting Regression Based on Machine Learning for Predicting Medical Waste: A Case from Turkey. Mathematics, 10(14), 2466. (2022).
Alhudud, https://alhudood.net. [Accessed 15.1.2023].
Ahram Al-Mexici,الأهرام المكسيكية – نحن نصنع الأخبار (alahraam.com).[Accessed 15.1.2023].
Github, https://github.com/sadanyh/Satirical-Fake-News-Dataset.[ Accessed 20.1.2023].
H Paul Grice. Studies in the Way of Words. Harvard University Press. 1991.
Powers, D. M. Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. arXiv preprint arXiv:2010.16061. (2020)
Brown, C. D., & Davis, H. T. Receiver operating characteristics curves and related decision measures: A tutorial. Chemometrics and Intelligent Laboratory Systems, 80(1), 24-38. (2006).
Vakili, M., Ghamsari, M., & Rezaei .MPerformance analysis and comparison of machine and deep learning algorithms for IoT data classification. arXiv preprint arXiv:2001.09636.(2020).
Extracting Prominent Aspects of Online Customer Reviews: A Data-Driven Approach to Big Data Analytics
Ali, N.M., Alshahrani, A., Alghamdi, A.M., & Novikov, B. Electronics, 11(13), 2042, 2022.
Sammut, C., & Webb, G. I. (Eds.). (2011). Encyclopedia of machine learning. Springer Science & Business Media.
Science Direct,Total Execution Time - an overview | ScienceDirect Topics, [Accessed 23.1.2023].
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2023 Mohammed E. Almandouh , Mohammed F. Alrahmawy, Mohamed Eisa, A. S. Tolba
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.