Ensemble Based Low Complexity Arabic Fake News Detection

Mohammed E.  Almandouh; Mohammed F.  Alrahmawy; Mohamed  Eisa; A. S.  Tolba

Authors

Mohammed E. Almandouh Information system department, Faculty of Management Technology and information system, Port said University, Port Said, Egypt. https://orcid.org/0009-0004-7369-0855
Mohammed F. Alrahmawy Head of Computer Science Department, Faculty of Computer and Information, Mansoura University, Mansoura, Egypt https://orcid.org/0000-0001-8978-8268
Mohamed Eisa Information technology Department, Faculty of Management Technology and Information Systems, Port said University, Port Said, Egypt. https://orcid.org/0000-0003-2685-0057
A. S. Tolba Department of Computer Science, Faculty of Computers and Information, Mansoura University, Mansoura 35516, Egypt. New Heliopolis Institute for Engineering & Automotive and Energy Technologies, New Heliopolis, Egypt https://orcid.org/0000-0002-2751-367X

Keywords:

Machine learning, Deep learning, Natural Language Processing, Fake News, Arabic Fake Detection (AFD), Ensemble Learning (EL)

Abstract

Nowadays, Due to the growth of online communication, many people use social media platforms and produce a lot of content. Fake news creates negative perceptions of society. The rise of social media networks has given fake news a platform to rapidly gain popularity among users. Identifying and labelling Arabic fake news represents a big challenge because of the large amount of heterogeneous content in addition to the limited related Arabic datasets available. machine learning (ML), Natural language processing (NLP) and deep learning (DL) are commonly used to increase the speed and automate the analytical process of this huge amount of content and to transform unstructured text into structured form. In this study, a corpus of news websites is developed to determine fake news using some machine learning techniques; this includes a dataset 3185 fake news articles and a new dataset consists of 1453 real news articles. This paper shows that, using an aggregation of machine learning and ensemble methods, we can make a prediction model for fake news that has an accuracy of up to 100%. with low complexity, which can save power and energy, and we can use it as a reference for the detection of fake news in Egypt and Arab countries.

Downloads

Download data is not yet available.

References

De Sarkar, S., Yang, F., & Mukherjee, Attending sentences to detect satirical fake news. In Proceedings of the 27th international conference on computational linguistics (pp. 3371-3380), A. (2018, August).‏

Zitouni, I., Abdul-Mageed, M., Bouamor, H., Bougares, F., El-Haj, M., Tomeh, N., & Zaghouani, W.Proceedings of the Fifth Arabic Natural Language Processing Workshop. In Proceedings of the Fifth Arabic Natural Language Processing Workshop. (2020, December).

Alonso García, S., Gómez García, G., Sanz Prieto, M., Moreno Guerrero, A. J., & Rodríguez Jiménez, C. The impact of term fake news on the scientific community. Scientific performance and mapping in web of science. Social Sciences, 9(5), 73. (2020).‏

Roberts, J. J. Hoax over ‘dead’Ethereum founder spurs $4 billion wipe out. Fortune.‏ (2017).

Bastos, M. T., & Mercea, D. The Brexit botnet and user-generated hyperpartisan news. Social science computer review, 37(1), 38-54. (2019)‏.

Silverman, C., & Alexander, L. How teens in the Balkans are duping Trump supporters with fake news. BuzzFeed, 3 November.‏ (2016).

World Health Organization. Fighting misinformation in the time of COVID-19, one click at a time. World Health Organization. Retrieved January, 6.‏ (2021).

Gray, R. Lies, propaganda and fake news: A challenge for our age. BBC News, 1.‏ (2017).

Alkhair, M., Meftouh, K., Smaïli, K., & Othman, N. An arabic corpus of fake news: Collection, analysis and classification. In Arabic Language Processing: From Theory to Practice: 7th International Conference, ICALP 2019, Nancy, France, October 16–17, 2019, Proceedings 7 (pp. 292-302). Springer International Publishing.‏ (2019).

SUN, Y. P., WANG, X. J., WANG, X. W., JIANG, S. W., & LIU, Y. B. Ensemble similarity measure for community-based question answer. The Journal of China Universities of Posts and Telecommunications, 21(1), 116-121.‏ (2014).

Zhang, Y., Liu, J., & Shen, W. A review of ensemble learning algorithms used in remote sensing applications. Applied Sciences, 12(17), 8654.‏ (2022).

Ahmad, I., Yousaf, M., Yousaf, S., & Ahmad, M. O. Fake news detection using machine learning ensemble methods. Complexity, 2020, 1-11. (2020).

Johnson, W., & Bouchard Jr, T. J. Sex differences in mental abilities: g masks the dimensions on which they lie. Intelligence, 35(1), 23-39. (2007).

Dey, A., Rafi, R. Z., Parash, S. H., Arko, S. K., & Chakrabarty, A. Fake news pattern recognition using linguistic analysis. In 2018 Joint 7th International Conference on Informatics, Electronics & Vision (ICIEV) and 2018 2nd International Conference on Imaging, Vision & Pattern Recognition (icIVPR) (pp. 305-309). IEEE.‏‏‏ (2018, June).

Bondielli, A., & Marcelloni, F. A survey on fake news and rumour detection techniques. Information Sciences, 497, 38-55.‏ (2019).

Marquardt, D. (2019). Linguistic indicators in the identification of fake news. Mediatization Studies, (3).‏

Torabi Asr, F., & Taboada, M. Big Data and quality data for fake news and misinformation detection. Big Data & Society, 6(1), 2053951719843310.‏ (2019).

Hancock, J. T., Curry, L. E., Goorha, S., & Woodworth, M. On lying and being lied to: A linguistic analysis of deception in computer-mediated communication. Discourse Processes, 45(1), 1-23.‏ (2007).

Rashkin, H., Choi, E., Jang, J. Y., Volkova, S., & Choi, Y. Truth of varying shades: Analyzing language in fake news and political fact-checking. In Proceedings of the 2017 conference on empirical methods in natural language processing (pp. 2931-2937).‏ (2017, September).

Clarke, I., & Grieve, J. Stylistic variation on the Donald Trump Twitter account: A linguistic analysis of tweets posted between 2009 and 2018. PloS one, 14(9), e0222062. (2019).

Ahmed, A. F., Mohamed, R., & Mostafa, B. Machine learning for authorship attribution in Arabic poetry. Int. J. Future Comput. Commun, 6(2), 42-46.‏ (2017).

Baraka, R. S., Salem, S., Hussien, M. A., Nayef, N., & Shaban, W. A. Arabic text author identification using support vector machines. Journal of Advanced Computer Science and Technology Research, 4(1), 1-11.‏‏ (2014).

Pérez-Rosas, V., Kleinberg, B., Lefevre, A., & Mihalcea, R. Automatic detection of fake news. arXiv preprint arXiv:1708.07104.‏ (2017).

Ahmed, H., Traore, I., & Saad, S. Detecting opinion spams and fake news using text classification. Security and Privacy, 1(1), e9.‏ (2018).

Uppal, A. ., Naruka, M. S. ., & Tewari, G. . (2023). Image Processing based Plant Disease Detection and Classification . International Journal on Recent and Innovation Trends in Computing and Communication, 11(1s), 52–56. https://doi.org/10.17762/ijritcc.v11i1s.5993

Reis, J. C., Correia, A., Murai, F., Veloso, A., & Benevenuto, F. Supervised learning for fake news detection. IEEE Intelligent Systems, 34(2), 76-81.‏ (2019).

Asghar, M. Z., Habib, A., Habib, A., Khan, A., Ali, R., & Khattak, A. Exploring deep neural networks for rumor detection. Journal of Ambient Intelligence and Humanized Computing, 12, 4315-4333.‏ (2021).

Goldani, M. H., Safabakhsh, R., & Momtazi, S. (2021). Convolutional neural network with margin loss for fake news detection. Information Processing & Management, 58(1), 102418.‏

Hakak, S., Alazab, M., Khan, S., Gadekallu, T. R., Maddikunta, P. K. R., & Khan, W. Z. An ensemble machine learning approach through effective feature extraction to classify fake news. Future Generation Computer Systems, 117, 47-58.‏ (2021).

Umer, M., Imtiaz, Z., Ullah, S., Mehmood, A., Choi, G. S., & On, B. W. (2020). Fake news stance detection using deep learning architecture (CNN-LSTM). IEEE Access, 8, 156695-156706.‏

Nasir, J. A., Khan, O. S., & Varlamis, I. Fake news detection: A hybrid CNN-RNN based deep learning approach. International Journal of Information Management Data Insights, 1(1), 100007.‏ (2021).

Golbeck, J., Mauriello, M., Auxier, B., Bhanushali, K. H., Bonk, C., Bouzaghrane, M. A., ... & Visnansky, G. (2018, May). Fake news vs satire: A dataset and analysis. In Proceedings of the 10th ACM Conference on Web Science (pp. 17-21).‏

Zhou, X., Mulay, A., Ferrara, E., & Zafarani, R. Recovery: A multimodal repository for covid-19 news credibility research. In Proceedings of the 29th ACM international conference on information & knowledge management (pp. 3205-3212).‏ (2020, October).

Saadany, H., Mohamed, E., & Orasan, C. (2020). Fake or real? A study of Arabic satirical fake news. arXiv preprint arXiv:2011.00452.‏

Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. Advances in neural information processing systems, 26.‏

Zalmout, N., & Habash, N. (2017). Optimizing tokenization choice for machine translation across multiple target languages. The Prague Bulletin of Mathematical Linguistics, 108(1), 257.‏

Elhassan, R., & Ahmed, M. Arabic text classification review. evaluation, 12, 13.‏ (2015).

Stemming and lemmatization,” Available: https://nlp.stanford.edu/IR-book/html/htmledition/stemmingand lemmatization-1.html. [Accessed 15.11.2017].

“nltk.stem.isri.ISRIStemmer”,available:https://programtalk.com/pythonexamples/nltk.stem.isri.ISRIStemmer. [Accessed 3.2.2018].

Goldberg, Y. (2017). Neural network methods for natural language processing. Synthesis lectures on human language technologies, 10(1), 1-309.‏

Tian, L., & Zhang, C. Using Hashtags to Analyze Purpose and Technology Application of Open-Source Project Related to COVID-19. arXiv preprint arXiv:2207.06219.‏ (2022).

Mikolov, T., Chen, K., Corrado, G., & Dean, J. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.‏ (2013).

Mitchell, T. M. The discipline of machine learning (Vol. 9). Pittsburgh: Carnegie Mellon University, School of Computer Science, Machine Learning Department.‏ (2006).

Granik, M., & Mesyura, V. Fake News Detection using Naïve Bayes Classifier. IEEE First Ukraine Conference on Electrical and Computer Engineering (UKRCON). Kiev, Ukraine, (2017).

Kecman, V. Support vector machines–an introduction. In Support vector machines: theory and applications (pp. 1-47). Berlin, Heidelberg: Springer Berlin Heidelberg.‏ (2005).

Lima, T. P. F., Sena, G. R., Neves, C. S., Vidal, S. A., Lima, J. T. O., Mello, M. J. G., & Silva, F. A. D. O. L. D. F. Death risk and the importance of clinical features in elderly people with COVID-19 using the Random Forest Algorithm. Revista Brasileira de Saúde Materno Infantil, 21, 445-451.‏ (2021).

Charbuty, B., & Abdulazeez, A. Classification based on decision tree algorithm for machine learning. Journal of Applied Science and Technology Trends, 2(01), 20-28.‏ (2021).

Dhabliya, D. (2021). Feature Selection Intrusion Detection System for The Attack Classification with Data Summarization. Machine Learning Applications in Engineering Education and Management, 1(1), 20–25. Retrieved from http://yashikajournals.com/index.php/mlaeem/article/view/8

Damanik, I. S., Windarto, A. P., Wanto, A., Andani, S. R., & Saputra, W. Decision tree optimization in C4. 5 algorithm using genetic algorithm. In Journal of Physics: Conference Series (Vol. 1255, No. 1, p. 012012). IOP Publishing.‏ (2019, August).

Gavankar, S. S., & Sawarkar, S. D. Eager decision tree. In 2017 2nd International Conference for Convergence in Technology (I2CT) (pp. 837-840). IEEE.‏ (2017, April).

Akhtar, S., Hussain, F., Raja, F. R., Ehatisham-ul-haq, M., Baloch, N. K., Ishmanov, F., & Zikria, Y. B. Improving mispronunciation detection of arabic words for non-native learners using deep convolutional neural network features. Electronics, 9(6), 963.‏ (2020).

Asghar, M. Z., Habib, A., Habib, A., Khan, A., Ali, R., & Khattak, A. Exploring deep neural networks for rumor detection. Journal of Ambient Intelligence and Humanized Computing, 12, 4315-4333.‏ (2021).

Anantharaman, K., Angel, S., Sivanaiah, R., Madhavan, S., & Rajendram, S. M. SSN_MLRG1@ LT-EDI-ACL2022: Multi-Class Classification using BERT models for Detecting Depression Signs from social media Text. In Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion (pp. 296-300).‏ (2022, May).

Devlin, J., & Chang, M. W. Research Scientists, Google AI Language: Open Sourcing BERT: State-of-the-Art Pre-training for Natural Language Processing (англ.). Google.‏ (2018).

Breiman, L. Bagging predictors. Machine learning, 24, 123-140.‏ (1996).

Schapire, R. E. The boosting approach to machine learning: An overview. Nonlinear estimation and classification, 149-171.‏ (2003).

Erdebilli, B., & Devrim-İçtenbaş, B. Ensemble Voting Regression Based on Machine Learning for Predicting Medical Waste: A Case from Turkey. Mathematics, 10(14), 2466.‏ (2022).

Alhudud, https://alhudood.net. [Accessed 15.1.2023].

Ahram Al-Mexici,الأهرام المكسيكية – نحن نصنع الأخبار (alahraam.com).[Accessed 15.1.2023].

Github, https://github.com/sadanyh/Satirical-Fake-News-Dataset.[ Accessed 20.1.2023].

H Paul Grice. Studies in the Way of Words. Harvard University Press. 1991.

Powers, D. M. Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. arXiv preprint arXiv:2010.16061.‏ (2020)

Brown, C. D., & Davis, H. T. Receiver operating characteristics curves and related decision measures: A tutorial. Chemometrics and Intelligent Laboratory Systems, 80(1), 24-38.‏ (2006).

Vakili, M., Ghamsari, M., & Rezaei .MPerformance analysis and comparison of machine and deep learning algorithms for IoT data classification. arXiv preprint arXiv:2001.09636.‏(2020).

Extracting Prominent Aspects of Online Customer Reviews: A Data-Driven Approach to Big Data Analytics

Ali, N.M., Alshahrani, A., Alghamdi, A.M., & Novikov, B. Electronics, 11(13), 2042, 2022.

Sammut, C., & Webb, G. I. (Eds.). (2011). Encyclopedia of machine learning. Springer Science & Business Media.‏

Science Direct,Total Execution Time - an overview | ScienceDirect Topics, [Accessed 23.1.2023].

Ensemble Based Low Complexity Arabic Fake News Detection

Authors

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

Announcements

Information for Authors

ijisae

Information

trindex