The Evaluation of Distributed Topic Models for Recognition of Health-Related Topics in Social Media Through Machine Learning Paradigms

Authors

  • Yerragudipadu Subbarayudu Research Scholar of Computer Science and Engineering, Jawaharlal Nehru Technological University, Anantapur,51502,AP, India
  • Alladi Sureshbabu Department of Computer Science and Engineering, JNTUA College of Engineering, Anantapur, 515002 AP, India.

Keywords:

Twitter, Hadoop, Machine Learning, LDA, CTM, NMI, TF, TFIDF, DTM, HDFS

Abstract

Social media is the most effective technique to obtain enormous amounts of data on tweets from the health field internationally. It is also a well-known source of data for anticipating healthcare-related solutions and looking for health-related phrases. In terms of income and employment, the health care industry has grown to become one of the biggest in the world. People may share their opinions and thoughts on a range of healthcare-related issues on Twitter, which is used by billions of users every day.  The research gap in topic modeling related to healthcare topics in social media refers to areas or aspects that have not been extensively explored or adequately addressed by existing studies. Here are a few potential research gaps in this domain. Many studies focus on general healthcare discussions in social media, but there may be specific healthcare topics or subdomains that have not received sufficient attention. Research could focus on exploring topic modeling techniques for niche healthcare areas like mental health, rare diseases, specific treatments, or emerging healthcare technologies. Most topic modeling studies in social media healthcare discussions do not account for the user context and demographics. Research could investigate the influence of user characteristics, such as age, gender, location, or occupation, on the topics discussed, providing a deeper understanding of how different demographics engage with healthcare topics. Social media platforms are highly dynamic, and the popularity and sentiment of healthcare topics can change rapidly. There is a need for research that focuses on analyzing the temporal dynamics of healthcare topics in social media and tracking the evolution of topics over time. Social media platforms not only consist of text-based content but also include visual and audiovisual data. Research could explore topic modeling techniques that can effectively integrate and analyze multimodal data, such as images, videos, or audio, in healthcare-related discussions. While various evaluation metrics exist for topic modeling, they may not capture the unique characteristics and challenges of healthcare-related discussions in social media. Developing domain-specific evaluation metrics or adapting existing metrics to better assess the quality and relevance of topics in healthcare-related social media data is an important research direction. Social media data often raises ethical concerns related to privacy, consent, and data usage. Research gaps exist in exploring ethical guidelines, data anonymization techniques, and best practices for conducting topic modeling research on healthcare topics in social media while ensuring privacy and confidentiality. Addressing these research gaps can contribute to a more comprehensive understanding of healthcare topics in social media discussions and provide valuable insights for healthcare practitioners, policymakers, and researchers. It can help identify emerging healthcare trends, public sentiment, and inform evidence-based decision-making in the healthcare domain. The main objective of this research era is By applying topic modeling methods such as CvLDA and DiCTM to healthcare topics in social media, researchers and practitioners can gain insights into the prevalent themes, concerns, and discussions in the online healthcare domain. It enables the identification of emerging topics, the monitoring of public perceptions and sentiments, the discovery of valuable information for public health interventions, and the understanding of patient experiences and needs in the digital space.

Downloads

Download data is not yet available.

References

kanksha Rajput, Manoj Kumar,"Anti-Ebola: an initiative to predict Ebola virus inhibitors through machine learning","Mol Divers

2022 Jun;26(3):1635-1644. doi: 10.1007/s11030-021-10291-7. Epub 2021 Aug 6."

Samuel K. Kwofie, Joseph Adams, Emmanuel Broni, Kweku S. Enninful, Clement Agoni, Mahmoud E. S. Soliman, Michael D. Wilson. "Artificial Intelligence, Machine Learning, and Big Data for Ebola Virus Drug Discovery" , Pharmaceuticals, 2023

Manu Anantpadma,†#∇ Thomas Lane,‡# Kimberley M. Zorn,‡ Mary A. Lingerfelt,‡ Alex M. Clark,§ Joel S. Freundlich, Robert A. Davey, Peter B. Madrid, and Sean Ekinscorresponding author"Ebola Virus Bayesian Machine Learning Models Enable New in Vitro Leads ", ACS Omega. 2019 Jan 31; 4(1): 2353–2361. Published online 2019 Jan 30. doi: 10.1021/acsomega.8b02948

Mujahed I. Mustafa, Shaza W. Shantier. "Next generation multi epitope based peptide vaccine against Marburg Virus disease combined with molecular docking studies" , Informatics in Medicine Unlocked, 2022.

Victor O. Gawriljuk, Phyo Phyo Kyaw Zin, Ana C. Puhl, Kimberley M. Zorn et al. "Machine Learning Models Identify Inhibitors of SARSCoV-2" , Journal of Chemical Information and Modeling, 2021

Songyuan Geng, Qiling Luo, Kun Liu, Yunchao Li, Yuchen Hou, Wujian Long. "Research status and prospect of machine learning in construction 3D printing" , Case Studies in Construction Materials, 2023

Fritz Heinrich Obermeyer, Martin Jankowiak, Nikolaos Barkas, Stephen F. Schaffner et al. "Analysis of 6.4 million SARS-CoV-2 genomes identifies mutations associated with fitness" , Cold Spring Harbor Laboratory, 2022.

Jayashree Jayashree, Shivaprakash T., Venugopal K.R.. "MUNPE:Multi-view Uncorrelated Neighborhood Preserving Embedding for unsupervised feature extraction" , Institute of Electrical and Electronics Engineers (IEEE), 2023.

Roh, S.M.; Eun, B.W.; Seo, J.Y. Does coronavirus disease 2019 affect body mass index of children and adolescents who visited a growth clinic in South Korea?: A single-center study. Ann. Pediatr. Endocrinol. Metab. 2022, 27, 52–59.

Barrett, C.E.; Koyama, A.K.; Alvarez, P.; Chow, W.; Lundeen, E.A.; Perrine, C.G.; Pavkov, M.E.; Rolka, D.B.; Wiltz, J.L.; Bull-Otterson, L.; et al. Risk for newly diagnosed diabetes >30 days after SARS-CoV-2 infection among persons aged <18 years—United States, March 1, 2020–June 28, 2021. Morb. Mortal. Wkly. Rep. 2022, 71, 59–65.

Kim, Y.; Park, S.; Oh, K.; Choi, H.; Jeong, E.K. Changes in the management of hypertension, diabetes mellitus, and hypercholes terolemia in Korean adults before and during the coronavirus disease 2019 pandemic: Data from the 2010–2020 Korea National Health and Nutrition Examination Survey. Epidemiol. Health 2023, e2023014, Online ahead of print.

Korea Disease Control and Prevention Agency. Weekly Updates for Countries with Major Outbreaks. 2022. Available online: http://ncov.mohw.go.kr/bdBoardList_Real.do?brdId=1&brdGubun=11&ncvContSeq=&contSeq=&board_id=&gubun= (accessed on 25 June 2022).

Korean Diabetes Association. A Statement from the Korean Diabetes Association Regarding the COVID-19 Vaccine. 2021. Available online: https://www.diabetes.or.kr/popup/2021/pop20210126.html (accessed on 25 June 2022).

World Health organization. Coronavirus Disease 2019 (COVID-19): Situation Report, 51; World Health Organization: Geneva, Switzerland, 2020; pp. 1–9. Available online: https://apps.who.int/iris/handle/10665/331475 (accessed on 21 September 2022).

Ko, Y.S.; Lee, S.B.; Cha, M.J.; Kim, S.D.; Lee, J.H.; Han, J.Y.; Song, M. Topic modeling insomnia social media corpus using BERTopic and building automatic deep learning classification model. J. Korean Soc. Inf. Manag. 2022, 39, 111–129.

Hossain, M.M.; Tasnim, S.; Sultana, A.; Faizah, F.; Mazumder, H.; Zou, L.; McKyer, E.L.J.; Ahmed, H.U.; Ma, P. Epidemiology of mental health problems in COVID-19: A review. F1000Research 2020, 9, 636.

Rossi, R.; Socci, V.; Talevi, D.; Mensi, S.; Niolu, C.; Pacitti, F.; Di Marco, A.; Rossi, A.; Siracusano, A.; Di Lorenzo, G. COVID-19 pandemic and lockdown measures impact on mental health among the general population in Italy. Front. Psychiatry 2020, 11, 790.

De Santis, E.; Martino, A.; Rizzi, A. An infoveillance system for detecting and tracking relevant topics from Italian tweets during the COVID-19 event. IEEE Access 2020, 8, 132527–132538.

Wang, T.; Lu, K.; Chow, K.P.; Zhu, Q. COVID-19 Sensing: Negative Sentiment Analysis on Social Media in China via BERT Model. IEEE Access 2020, 8, 138162–138169.

Gourisaria, M.K.; Jee, G.; Harshvardhan, G.M.; Singh, V.; Singh, P.K.; Workneh, T.C. Data science appositeness in diabetes mellitus diagnosis for healthcare systems of developing nations. IET Commun. 2022, 16, 532–547.

JUNAID RASHID,SYED MUHAMMAD ADNAN SHAH,AUN IRTAZA TOQEER MAHMOOD,MUHAMMAD WASIF NISAR,, MUHAMMAD SHAFIQ,AND AKBER GARDEZI,"Topic Modeling Technique for Text Mining Over Biomedical Text Corpora Through Hybrid Inverse Documents Frequency and Fuzzy K-Means Clusterin","Digital Object Identifier 10.1109/ACCESS.2019.2944973".IEEE Access.

Singh, V.; Gourisaria, M.K.; Gm, H.; Rautaray, S.S.; Pandey, M.; Sahni, M.; Leon-Castro, E.; Espinoza-Audelo, L.F. Diagnosis of Intracranial Tumors via the Selective CNN Data Modeling Technique. Appl. Sci. 2022, 12, 2900.

Priya, S. ., & Suganthi, P. . (2023). Enlightening Network Lifetime based on Dynamic Time Orient Energy Optimization in Wireless Sensor Network. International Journal on Recent and Innovation Trends in Computing and Communication, 11(4s), 149–155. https://doi.org/10.17762/ijritcc.v11i4s.6321

Dhablia, A. (2021). Integrated Sentimental Analysis with Machine Learning Model to Evaluate the Review of Viewers. Machine Learning Applications in Engineering Education and Management, 1(2), 07–12. Retrieved from http://yashikajournals.com/index.php/mlaeem/article/view/12

Earshia V., D. ., & M., S. . (2023). Interpolation of Low-Resolution Images for Improved Accuracy Using an ANN Quadratic Interpolator . International Journal on Recent and Innovation Trends in Computing and Communication, 11(4s), 135–140. https://doi.org/10.17762/ijritcc.v11i4s.6319

Wiling, B. (2021). Locust Genetic Image Processing Classification Model-Based Brain Tumor Classification in MRI Images for Early Diagnosis. Machine Learning Applications in Engineering Education and Management, 1(1), 19–23. Retrieved from http://yashikajournals.com/index.php/mlaeem/article/view/6

Downloads

Published

21.09.2023

How to Cite

Subbarayudu, Y. ., & Sureshbabu, A. . (2023). The Evaluation of Distributed Topic Models for Recognition of Health-Related Topics in Social Media Through Machine Learning Paradigms. International Journal of Intelligent Systems and Applications in Engineering, 11(4), 511–534. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/3586

Issue

Section

Research Article