The Interplay between Natural Language Processing (NLP) and Clinical Data Mining in Healthcare: A Review

Authors

  • Shashank Agarwal, Praveen Gujar, Sriram Panyam

Keywords:

natural language processing, healthcare, clinical data, electronic health records, transfer learning, python library

Abstract

Natural Language Processing (NLP) has evolved as a transformational force in the healthcare industry, which suggests innovative ways to extract, generate, and process clinical data. This review paper delves into the critical role of NLP in recasting the healthcare sector through the extraction of essential medical information from multiple sources, such as Electronic Health Records (EHRs). The objective of this review is to identify the crucial role of NLP in the healthcare industry by extracting medical information from clinical data thus augmenting patient care, disease detection, clinical decision-making, patient compliance, and even medical transcription. This review includes a detailed assessment of various NLP techniques, from rule-based techniques to statistical approaches and large language models using transfer learning. It then explores different NLP libraries and frameworks that are being widely used in a variety of fields. It also covers the types of clinical data that can be further refined and utilized through NLP, followed by several most common NLP libraries that are particularly adapted to each healthcare application.  Moreover, applications and uses of NLP in healthcare systems are also discussed, paving the way for further research and its future scope in the field of health information technology. Although NLP holds a strong promising position in patient care, however, linguistic diversity, unstructured data, and semantic ambiguities are among some of the significant challenges and barriers to its implementation. Therefore, the article aims to highlight the necessity for continuous improvement and advancement in NLP techniques for ensuring the accuracy, efficiency, and reliability of data extraction, interpretation, and application within the healthcare domain.

Downloads

Download data is not yet available.

References

M. Abdelwahap, M. Elfarash, and A. Eltanboly, "Applications of Natural Language Processing in Healthcare Systems," in The International Undergraduate Research Conference, vol. 5, no. 5, pp. 111-115, Aug. 2021.

J. Adler-Milstein et al., "Electronic health record adoption in US hospitals: progress continues, but challenges persist," Health Affairs, vol. 34, no. 12, pp. 2174-2180, 2015.

A. Ahmed et al., "Anxiety and depression chatbot features: a scoping review," JMIR Preprints, 26341, 2020.

Z. Alyafeai, M. S. AlShaibani, and I. Ahmad, "A survey on transfer learning in natural language processing," arXiv preprint arXiv:2007.04239, 2020.

P. J. Antony and K. P. Soman, "Kernel based part of speech tagger for kannada," in 2010 International Conference on Machine Learning and Cybernetics, vol. 4, pp. 2139-2144, Jul. 2010.

E. Basch et al., "Symptom monitoring with patient-reported outcomes during routine cancer treatment: a randomized controlled trial," Journal of Clinical Oncology, vol. 34, no. 6, p. 557, 2016.

C. Bombardier and A. Maetzel, "Pharmacoeconomic evaluation of new treatments: efficacy versus effectiveness studies?," Annals of the Rheumatic Diseases, vol. 58, suppl. 1, pp. I82-I85, 1999.

P. Bouillon et al., "A shared multilingual grammar for machine speech translation," in Proceedings of the 13th Conference on Natural Language Processing. Long Articles, pp. 93-102, Apr. 2006.

A. Bucur et al., "Workflow-driven clinical decision support for personalized oncology," BMC Medical Informatics and Decision Making, vol. 16, pp. 151-162, 2016.

D. S. Carrell et al., "Challenges in adapting existing clinical natural language processing systems to multiple, diverse health care settings," Journal of the American Medical Informatics Association, vol. 24, no. 5, pp. 986-991, 2017.

K. Chowdhary and K. R. Chowdhary, "Natural language processing," in Fundamentals of Artificial Intelligence, pp. 603-649, 2020.

D. Demner-Fushman, W. W. Chapman, and C. J. McDonald, "What can natural language processing do for clinical decision support?," Journal of Biomedical Informatics, vol. 42, no. 5, pp. 760-772, 2009.

K. Doing-Harris, D. L. Mowery, C. Daniels, W. W. Chapman, and M. Conway, "Understanding patient satisfaction with received healthcare services: a natural language processing approach," in AMIA Annual Symposium Proceedings, vol. 2016, p. 524, 2016.

G. J. Downing, S. N. Boyle, K. M. Brinner, and J. A. Osheroff, "Information management to enable personalized medicine: stakeholder roles in building clinical decision support," BMC Medical Informatics and Decision Making, vol. 9, pp. 1-11, 2009.

Explosion AI, "spaCy: Industrial-strength natural language processing," online, Available: https://spacy.io, 2018.

D. Georgiou, A. MacFarlane, and T. Russell-Rose, "Extracting sentiment from healthcare survey data: An evaluation of sentiment analysis tools," in 2015 Science and Information Conference (SAI), pp. 352-361, Jul. 2015.

C. Gopalappa et al., "Dismod-ML: A Python framework for disease modeling," PLoS ONE, vol. 14, no. 6, p. e0217976, 2019.

F. R. Goss et al., "A clinician survey of using speech recognition for clinical documentation in the electronic health record," International Journal of Medical Informatics, vol. 130, p. 103938, 2019.

V. N. Gudivada and K. Arbabifard, "Open-source libraries, application frameworks, and workflow systems for NLP," in Handbook of Statistics, vol. 38, pp. 31-50, 2018.

K. Haerian, H. Salmasian, and C. Friedman, "Methods for identifying suicide or suicidal ideation in EHRs," in AMIA Annual Symposium Proceedings, vol. 2012, p. 1244, 2012.

C. Hogan, J. Lunney, J. Gabel, and J. Lynn, "Medicare beneficiaries’ costs of care in the last year of life," Health Affairs, vol. 20, no. 4, pp. 188-195, 2001.

J. Holland et al., "Service robots in the healthcare sector," Robotics, vol. 10, no. 1, p. 47, 2021.

K. Huang, J. Altosaar, and R. Ranganath, "ClinicalBERT: Modeling clinical notes and predicting hospital readmission," arXiv preprint arXiv:1904.05342, 2019.

M. Hughes, I. Li, S. Kotoulas, and T. Suzumura, "Medical text classification using convolutional neural networks," in Informatics for Health: Connected Citizen-Led Wellness and Population Health, pp. 246-250, 2017.

N. Kang et al., "MedNLP: a multi-modal query-based system for cross-modal retrieval of medical cases," Journal of the American Medical Informatics Association, vol. 25, no. 5, pp. 512-519, 2018..

J. Mahatpure, M. Motwani, and P. K. Shukla, "An electronic prescription system powered by speech recognition, natural language processing and blockchain technology," International Journal of Science & Technology Research (IJSTR), vol. 8, no. 08, pp. 1454-1462, 2019

S. Maiti, "Extracting medical information from clinical text with NLP," Analytics Vidhya, 2023. [Online]. Available: https://www.analyticsvidhya.com/blog/2023/02/extracting-medical-information-from-clinical-text-with-nlp/. [Accessed: 24-Sep-2023].

A. Malte and P. Ratadiya, "Evolution of transfer learning in natural language processing," arXiv preprint arXiv:1910.07370, 2019.

B. Manchanda and V. AnantAthavale, "Various Statistical Techniques Used in NLP," International Journal of Computer Applications & Information Technology, vol. 9, no. 1, p. 172, 2016.

S. Manik, G. Singh, and R. Singh, "Design and analysis of stochastic DSS query optimizers in a distributed database system," Egyptian Informatics Journal, Cairo University, 2015.

T. A. Manolio et al., "Implementing genomic medicine in the clinic: the future is here," Genetics in Medicine, vol. 15, no. 4, pp. 258-267, 2013.

M. Marcus, "New trends in natural language processing: statistical natural language processing," Proceedings of the National Academy of Sciences, vol. 92, no. 22, pp. 10052-10059, 1995.

A. K. McCallum, "MALLET: a machine learning for language toolkit," 2018. [Online]. Available: http://mallet.cs.umass.edu/. [Accessed: 24-Sep-2023].

J. Meehan et al., "Precision medicine and the role of biomarkers of radiotherapy response in breast cancer," Frontiers in Oncology, vol. 10, p. 628, 2020.

N. Menachemi and T. H. Collum, "Benefits and drawbacks of electronic health record systems," Risk Management and Healthcare Policy, pp. 47-55, 2011.

L. Moja et al., "Effectiveness of a hospital-based computerized decision support system on clinician recommendations and patient outcomes: a randomized clinical trial," JAMA Network Open, vol. 2, no. 12, pp. e1917094-e1917094, 2019.

M. Neumann, D. King, I. Beltagy, and W. Ammar, "ScispaCy: fast and robust models for biomedical natural language processing," arXiv preprint arXiv:1902.07669, 2019.

NLTK Project, "Natural Language Toolkit (NLTK)," 2018. [Online]. Available: https://www.nltk.org/. [Accessed: 22-Sep-2023].

H. Öztürk, A. Özgür, P. Schwaller, T. Laino, and E. Ozkirimli, "Exploring chemical space using natural language processing methodologies for drug discovery," Drug Discovery Today, vol. 25, no. 4, pp. 689-705, 2020.

J. Pavlopoulos, V. Kougia, and I. Androutsopoulos, "A survey on biomedical image captioning," in Proceedings of the Second Workshop on Shortcomings in Vision and Language, pp. 26-36, Jun. 2019.

T. H. Payne, W. D. Alonso, J. A. Markiel, K. Lybarger, and A. A. White, "Using voice to create hospital progress notes: description of a mobile application and supporting system integrated with a commercial electronic health record," Journal of Biomedical Informatics, vol. 77, pp. 91-96, 2018..

J. Pustejovsky and A. Stubbs, Natural Language Annotation for Machine Learning: A guide to corpus-building for applications. O'Reilly Media, Inc., 2012.

G. Randhawa, M. Ferreyra, R. Ahmed, O. Ezzat, and K. Pottie, "Using machine translation in clinical practice," Canadian Family Physician, vol. 59, no. 4, pp. 382-383, 2013.

M. Rayner et al., "A methodology for comparing grammar-based and robust approaches to speech understanding," in INTERSPEECH, pp. 1877-1880, Sep. 2005.

D. Riaño, M. Peleg, and A. Ten Teije, "Ten years of knowledge representation for health care (2009–2018): Topics, trends, and challenges," Artificial Intelligence in Medicine, vol. 100, p. 101713, 2019.

B. Settles, M. Craven, and L. Friedland, "Active learning with real annotation costs," in Proceedings of the NIPS Workshop on Cost-Sensitive Learning, vol. 1, Dec. 2008.

M. Sharma, G. Singh, R. Singh, and G. Singh, "Analysis of DSS queries using entropy based restricted genetic algorithm," Applied Mathematics & Information Sciences, vol. 9, no. 5, p. 2599, 2015.

M. Sharma, G. Singh, R. Singh, and S. Singh, "Statistical Analysis of DSS Query Optimizer for a Five Join DSS Query," International Journal of Computer Applications, vol. 141, no. 6, pp. 1-4, 2016.

M. Sharma, G. Singh, R. S. Virk, and G. Singh, "Design and comparative analysis of DSS queries in distributed environment," in 2013 International Computer Science and Engineering Conference (ICSEC), pp. 73-78, Sep. 2013.

J. S. Son et al., "Association of blood pressure classification in Korean young adults according to the 2017 American College of Cardiology/American Heart Association guidelines with subsequent cardiovascular disease events," JAMA, vol. 320, no. 17, pp. 1783-1792, 2018.

The Apache Software Foundation, "Stanford CoreNLP: natural language software," 2018. [Online]. Available: https://stanfordnlp.github.io/CoreNLP/. [Accessed: 22-Sep-2023].

The University of Sheffield, "General Architecture for Text Engineering (GATE)," 2018. [Online]. Available: http://gate.ac.uk/. [Accessed: 24-Sep-2023].

N. Vaci, D. Cocić, B. Gula, and M. Bilalić, "Large data and Bayesian modeling—aging curves of NBA players," Behavior Research Methods, vol. 51, pp. 1544-1564, 2019.

R. Valencia-Garcia, R. Martinez-Bejar, and A. Gasparetto, "An intelligent framework for simulating robot-assisted surgical operations," Expert Systems with Applications, vol. 28, no. 3, pp. 425-433, 2005.

X. Wang, G. Hripcsak, M. Markatou, and C. Friedman, "Active computerized pharmacovigilance using natural language processing, statistics, and electronic health records: a feasibility study," Journal of the American Medical Informatics Association, vol. 16, no. 3, pp. 328-337, 2009.

H. Wu et al., "SemEHR: A general-purpose semantic search system to surface semantic data from clinical notes for tailored care, trial recruitment, and clinical research," Journal of the American Medical Informatics Association, vol. 25, no. 5, pp. 530-537, 2018.

G. Xu, W. Rong, Y. Wang, Y. Ouyang, and Z. Xiong, "External features enriched model for biomedical question answering," BMC Bioinformatics, vol. 22, no. 1, p. 272, 2021.

Ouerhani, N., Maalel, A., & Ben Ghézela, H. (2020). SPeCECA: a smart pervasive chatbot for emergency case assistance based on cloud computing. Cluster Computing, 23, 2471-2482.

Downloads

Published

02.06.2024

How to Cite

Shashank Agarwal. (2024). The Interplay between Natural Language Processing (NLP) and Clinical Data Mining in Healthcare: A Review. International Journal of Intelligent Systems and Applications in Engineering, 12(3), 4161–4169. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/6120

Issue

Section

Research Article