Unsupervised Misinformation Detection Model using Incremental K-Means Algorithm
Keywords:
Clustering, Incremental Learning, K-Mean, Misinformation Detection, Unsupervised LearningAbstract
The state-of-the-art misinformation detection techniques mainly focused on supervised learning approach, however, it requires a huge amount of labeled dataset resulting into manual efforts and delays in detecting misinformation. Thus, an unsupervised approach to misinformation detection is in demand. The researchers with unsupervised misinformation detection show average performance as they lack in generating important textual and user-specific features. Further, since the data in the real world is time- sensitive, a large amount of data is generated over a period of time and the models need to adapt to this newly arriving chunk of data. To tackle the above problems, the authors have proposed a first-of-its-kind unsupervised misinformation detection model using an incremental learning approach that can handle newly arriving data without needing to label the data. To evaluate the model's performance, the authors have used various metrics like silhouette score, purity, and importance of various features in cluster formation. The model showed a purity score of 0.92 % and average silhouette score of 0.57%.
Downloads
References
Z. Jahanbakhsh-Nagadeh, M.-R. Feizi-Derakhshi, and A. Sharifi, “A semi-supervised model for Persian rumor verification based on content information,” Multimed. Tools Appl., vol. 80, no. 28–29, pp. 35267–35295, 2021, doi: 10.1007/s11042-020-10077-3.
Y. Barve and J. R. Saini, “Healthcare Misinformation Detection and Fact-Checking : A Novel Approach,” Int. J. Adv. Comput. Sci. Appl., vol. 12, no. 10, pp. 295–303, 2021.
Y. Barve, J. R. Saini, K. Pal, and K. Kotecha, “A Novel Evolving Sentimental Bag-of-Words Approach for Feature Extraction to Detect Misinformation,” Int. J. Adv. Comput. Sci. Appl., vol. 13, no. 4, pp. 266–275, 2022, doi: 10.14569/IJACSA.2022.0130431.
D. Li, H. Guo, Z. Wang, and Z. Zheng, “Unsupervised Fake News Detection Based on Autoencoder,” IEEE Access, vol. 9, pp. 29356–29365, 2021, doi: 10.1109/ACCESS.2021.3058809.
S. Hosseinimotlagh and E. E. Papalexakis, “Unsupervised content-based identification of fake news articles with tensor decomposition ensembles,” Proc. WSDM MIS2 Misinformation Misbehavior Min. Web Work., pp. 1–8, 2018, doi: 10.475/123.
J. Gaglani, Y. Gandhi, S. Gogate, and A. Halbe, “Unsupervised WhatsApp Fake News Detection using Semantic Search,” in Proceedings of the International Conference on Intelligent Computing and Control Systems, ICICCS 2020, 2020, pp. 285–289, doi: 10.1109/ICICCS48265.2020.9120902.
the detection of fake news articles,” Expert Syst. Appl., vol. 177, 2021, doi: 10.1016/j.eswa.2021.115002.
S. G. Taskin, E. U. Kucuksille, and K. Topal, “Detection of Turkish Fake News in Twitter with Machine Learning Algorithms,” Arab. J. Sci. Eng., vol. 47, no. 2, pp. 2359–2379, 2022, doi: 10.1007/s13369-021-06223-0.
S. M. Alzanin and A. M. Azmi, “Rumor detection in Arabic tweets using semi-supervised and unsupervised expectation–maximization,” Knowledge-Based Syst., vol. 185, 2019, doi: 10.1016/j.knosys.2019.104945.
C. Chang, Y. Zhang, C. Szabo, and Q. Z. Sheng, “Extreme user and political rumor detection on twitter,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 10086 LNAI, pp. 751–763, 2016, doi: 10.1007/978-3-319-49586-6_54.
P. Meel and D. K. Vishwakarma, “Fake news, rumor, information pollution in social media and web: A contemporary survey of state-of-the-arts, challenges and opportunities,” Expert Syst. Appl., vol. 153, 2020, doi: 10.1016/j.eswa.2019.112986.
M. Chen, X. Chu, and K. P. Subbalakshmi, “MMCoVaR: Multimodal COVID-19 vaccine focused data repository for fake news detection and a baseline architecture for classification,” in Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2021, 2021, pp. 31–38, doi: 10.1145/3487351.3488346.
K. Pogorelov, D. T. Schroeder, P. Filkuková, S. Brenner, and J. Langguth, WICO Text: A Labeled Dataset of Conspiracy Theory and 5G-Corona Misinformation Tweets, vol. 1, no. 1. Association for Computing Machinery, 2021.
M. Mayank, S. Sharma, and R. Sharma, “DEAP-FAKED: Knowledge Graph based Approach for Fake News Detection,” 2021, [Online]. Available: http://arxiv.org/abs/2107.10648.
M. Isaakidou, E. Zoulias, and M. Diomidous, Machine learning to identify fake news for COVID-19. IOS Press, 2021.
J. Ayoub, X. J. Yang, and F. Zhou, “Combat COVID-19 infodemic using explainable natural language processing models,” Inf. Process. Manag., vol. 58, no. 4, 2021, doi: 10.1016/j.ipm.2021.102569.
K. Nath, P. Soni, Anjum, A. Ahuja, and R. Katarya, “Study of Fake News Detection using Machine Learning and Deep Learning Classification Methods,” in 2021 6th International Conference on Recent Trends on Electronics, Information, Communication and Technology, RTEICT 2021, 2021, pp. 434–438, doi: 10.1109/RTEICT52294.2021.9573583.
Y. Zhao, J. Da, and J. Yan, “Detecting health misinformation in online health communities: Incorporating behavioral features into machine learning based approaches,” Inf. Process. Manag., vol. 58, no. 1, 2021, doi: 10.1016/j.ipm.2020.102390.
Y. Barve, J. R. Saini, K. Kotecha, and H. Gaikwad, “Detecting and Fact-checking Misinformation using ‘Veracity Scanning Model,’” Int. J. Adv. Comput. Sci. Appl., vol. 13, no. 2, pp. 201–209, 2022, doi: 10.14569/IJACSA.2022.0130225.
I. Baris and Z. Boukhers, “ECOL: Early Detection of COVID Lies Using Content, Prior Knowledge and Source Information,” Commun. Comput. Inf. Sci., vol. 1402 CCIS, pp. 141–152, 2021, doi: 10.1007/978-3-030-73696-5_14.
W. Zang, P. Zhang, C. Zhou, and L. Guo, “Comparative study between incremental and ensemble learning on data streams: Case study,” J. Big Data, vol. 1, no. 1, pp. 1–16, 2014, doi: 10.1186/2196-1115-1-5.
P. Ksieniewicz, P. Zyblewski, M. Choraś, R. Kozik, A. Giełczyk, and M. Woźniak, “Fake News Detection from Data Streams,” Proc. Int. Jt. Conf. Neural Networks, 2020, doi: 10.1109/IJCNN48605.2020.9207498.
S. Hakak, M. Alazab, S. Khan, T. R. Gadekallu, P. K. R. Maddikunta, and W. Z. Khan, “An ensemble machine learning approach through effective feature extraction to classify fake news,” Futur. Gener. Comput. Syst., vol. 117, pp. 47–58, 2021, doi: 10.1016/j.future.2020.11.022.
A. Habib, M. Z. Asghar, A. Khan, A. Habib, and A. Khan, “False information detection in online content and its role in decision making: a systematic literature review,” Soc. Netw. Anal. Min., vol. 9, no. 1, 2019, doi: 10.1007/s13278-019-0595-5.
A. Chefrour, “Incremental supervised learning: algorithms and applications in pattern recognition,” Evol. Intell., vol. 12, no. 2, pp. 97–112, 2019, doi: 10.1007/s12065-019-00203-y.
X. Zhou, A. Mulay, E. Ferrara, and R. Zafarani, “ReCOVery: A Multimodal Repository for COVID-19 News Credibility Research,” in International Conference on Information and Knowledge Management, Proceedings, 2020, pp. 3205–3212, doi: 10.1145/3340531.3412880.
Downloads
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.