Unsupervised Misinformation Detection Model using Incremental K-Means Algorithm

Yashoda Barve; Jatinderkumar R. Saini

Authors

Yashoda Barve Suryadatta College of Management Information Research & Technology, Pune, India https://orcid.org/0000-0003-3422-2464
Jatinderkumar R. Saini 2 Symbiosis Institute of Computer Studies and Research, Symbiosis International (Deemed University), Pune, India https://orcid.org/0000-0001-5205-5263

Keywords:

Clustering, Incremental Learning, K-Mean, Misinformation Detection, Unsupervised Learning

Abstract

The state-of-the-art misinformation detection techniques mainly focused on supervised learning approach, however, it requires a huge amount of labeled dataset resulting into manual efforts and delays in detecting misinformation. Thus, an unsupervised approach to misinformation detection is in demand. The researchers with unsupervised misinformation detection show average performance as they lack in generating important textual and user-specific features. Further, since the data in the real world is time- sensitive, a large amount of data is generated over a period of time and the models need to adapt to this newly arriving chunk of data. To tackle the above problems, the authors have proposed a first-of-its-kind unsupervised misinformation detection model using an incremental learning approach that can handle newly arriving data without needing to label the data. To evaluate the model's performance, the authors have used various metrics like silhouette score, purity, and importance of various features in cluster formation. The model showed a purity score of 0.92 % and average silhouette score of 0.57%.

Downloads

Download data is not yet available.

References

Z. Jahanbakhsh-Nagadeh, M.-R. Feizi-Derakhshi, and A. Sharifi, “A semi-supervised model for Persian rumor verification based on content information,” Multimed. Tools Appl., vol. 80, no. 28–29, pp. 35267–35295, 2021, doi: 10.1007/s11042-020-10077-3.

Y. Barve and J. R. Saini, “Healthcare Misinformation Detection and Fact-Checking : A Novel Approach,” Int. J. Adv. Comput. Sci. Appl., vol. 12, no. 10, pp. 295–303, 2021.

Y. Barve, J. R. Saini, K. Pal, and K. Kotecha, “A Novel Evolving Sentimental Bag-of-Words Approach for Feature Extraction to Detect Misinformation,” Int. J. Adv. Comput. Sci. Appl., vol. 13, no. 4, pp. 266–275, 2022, doi: 10.14569/IJACSA.2022.0130431.

D. Li, H. Guo, Z. Wang, and Z. Zheng, “Unsupervised Fake News Detection Based on Autoencoder,” IEEE Access, vol. 9, pp. 29356–29365, 2021, doi: 10.1109/ACCESS.2021.3058809.

S. Hosseinimotlagh and E. E. Papalexakis, “Unsupervised content-based identification of fake news articles with tensor decomposition ensembles,” Proc. WSDM MIS2 Misinformation Misbehavior Min. Web Work., pp. 1–8, 2018, doi: 10.475/123.

J. Gaglani, Y. Gandhi, S. Gogate, and A. Halbe, “Unsupervised WhatsApp Fake News Detection using Semantic Search,” in Proceedings of the International Conference on Intelligent Computing and Control Systems, ICICCS 2020, 2020, pp. 285–289, doi: 10.1109/ICICCS48265.2020.9120902.

the detection of fake news articles,” Expert Syst. Appl., vol. 177, 2021, doi: 10.1016/j.eswa.2021.115002.

S. G. Taskin, E. U. Kucuksille, and K. Topal, “Detection of Turkish Fake News in Twitter with Machine Learning Algorithms,” Arab. J. Sci. Eng., vol. 47, no. 2, pp. 2359–2379, 2022, doi: 10.1007/s13369-021-06223-0.

S. M. Alzanin and A. M. Azmi, “Rumor detection in Arabic tweets using semi-supervised and unsupervised expectation–maximization,” Knowledge-Based Syst., vol. 185, 2019, doi: 10.1016/j.knosys.2019.104945.

C. Chang, Y. Zhang, C. Szabo, and Q. Z. Sheng, “Extreme user and political rumor detection on twitter,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 10086 LNAI, pp. 751–763, 2016, doi: 10.1007/978-3-319-49586-6_54.

P. Meel and D. K. Vishwakarma, “Fake news, rumor, information pollution in social media and web: A contemporary survey of state-of-the-arts, challenges and opportunities,” Expert Syst. Appl., vol. 153, 2020, doi: 10.1016/j.eswa.2019.112986.

M. Chen, X. Chu, and K. P. Subbalakshmi, “MMCoVaR: Multimodal COVID-19 vaccine focused data repository for fake news detection and a baseline architecture for classification,” in Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2021, 2021, pp. 31–38, doi: 10.1145/3487351.3488346.

K. Pogorelov, D. T. Schroeder, P. Filkuková, S. Brenner, and J. Langguth, WICO Text: A Labeled Dataset of Conspiracy Theory and 5G-Corona Misinformation Tweets, vol. 1, no. 1. Association for Computing Machinery, 2021.

M. Mayank, S. Sharma, and R. Sharma, “DEAP-FAKED: Knowledge Graph based Approach for Fake News Detection,” 2021, [Online]. Available: http://arxiv.org/abs/2107.10648.

M. Isaakidou, E. Zoulias, and M. Diomidous, Machine learning to identify fake news for COVID-19. IOS Press, 2021.

J. Ayoub, X. J. Yang, and F. Zhou, “Combat COVID-19 infodemic using explainable natural language processing models,” Inf. Process. Manag., vol. 58, no. 4, 2021, doi: 10.1016/j.ipm.2021.102569.

K. Nath, P. Soni, Anjum, A. Ahuja, and R. Katarya, “Study of Fake News Detection using Machine Learning and Deep Learning Classification Methods,” in 2021 6th International Conference on Recent Trends on Electronics, Information, Communication and Technology, RTEICT 2021, 2021, pp. 434–438, doi: 10.1109/RTEICT52294.2021.9573583.

Y. Zhao, J. Da, and J. Yan, “Detecting health misinformation in online health communities: Incorporating behavioral features into machine learning based approaches,” Inf. Process. Manag., vol. 58, no. 1, 2021, doi: 10.1016/j.ipm.2020.102390.

Y. Barve, J. R. Saini, K. Kotecha, and H. Gaikwad, “Detecting and Fact-checking Misinformation using ‘Veracity Scanning Model,’” Int. J. Adv. Comput. Sci. Appl., vol. 13, no. 2, pp. 201–209, 2022, doi: 10.14569/IJACSA.2022.0130225.

I. Baris and Z. Boukhers, “ECOL: Early Detection of COVID Lies Using Content, Prior Knowledge and Source Information,” Commun. Comput. Inf. Sci., vol. 1402 CCIS, pp. 141–152, 2021, doi: 10.1007/978-3-030-73696-5_14.

W. Zang, P. Zhang, C. Zhou, and L. Guo, “Comparative study between incremental and ensemble learning on data streams: Case study,” J. Big Data, vol. 1, no. 1, pp. 1–16, 2014, doi: 10.1186/2196-1115-1-5.

P. Ksieniewicz, P. Zyblewski, M. Choraś, R. Kozik, A. Giełczyk, and M. Woźniak, “Fake News Detection from Data Streams,” Proc. Int. Jt. Conf. Neural Networks, 2020, doi: 10.1109/IJCNN48605.2020.9207498.

S. Hakak, M. Alazab, S. Khan, T. R. Gadekallu, P. K. R. Maddikunta, and W. Z. Khan, “An ensemble machine learning approach through effective feature extraction to classify fake news,” Futur. Gener. Comput. Syst., vol. 117, pp. 47–58, 2021, doi: 10.1016/j.future.2020.11.022.

A. Habib, M. Z. Asghar, A. Khan, A. Habib, and A. Khan, “False information detection in online content and its role in decision making: a systematic literature review,” Soc. Netw. Anal. Min., vol. 9, no. 1, 2019, doi: 10.1007/s13278-019-0595-5.

A. Chefrour, “Incremental supervised learning: algorithms and applications in pattern recognition,” Evol. Intell., vol. 12, no. 2, pp. 97–112, 2019, doi: 10.1007/s12065-019-00203-y.

X. Zhou, A. Mulay, E. Ferrara, and R. Zafarani, “ReCOVery: A Multimodal Repository for COVID-19 News Credibility Research,” in International Conference on Information and Knowledge Management, Proceedings, 2020, pp. 3205–3212, doi: 10.1145/3340531.3412880.

Unsupervised Misinformation Detection Model using Incremental K-Means Algorithm

Authors

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

Announcements

Information for Authors

ijisae

Information

trindex