Biomedical Document Enhancement through Probabilistic Graph Clustering: Indexing and Key Phrase Mining

Authors

  • Jose Mary Golamari Research scholar, Dept. of CSE, Koneru Lakshmaiah Education Foundation, Vaddeswaram.Guntur, Andhra Pradesh, India.
  • D. Haritha Professor, Dept. of CSE, Koneru Lakshmaiah Education Foundation, Vaddeswaram.Guntur, Andhra Pradesh, India.

Keywords:

biomedical documents, gene data, feature extraction, document mapping

Abstract

As the size of biomedical databases is increasing along with genes and diseases, finding key feature sets is complex due to large data sizes and sparsity problems. The process of extracting relevant biomedical features from documents, ranking them based on their probability, and clustering them is a crucial aspect of biomedical document feature extraction, ranking, and classification. To extract gene/protein features from pre-processed documents, the Abner tagger is used, and the highest probability biomedical features are identified for the graph initialization process. The graph-based clustering is performed based on the relationship between gene/protein terms in the ranked document set. To enhance the quality of the cluster, a novel graph similarity measure is employed, which maximizes the probabilistic entropy measure and prioritizes gene-based ranked document clustering. Experimental results prove that the proposed model has better improvement over the conventional models.

Downloads

Download data is not yet available.

References

L. Zhang, W. Lu, H. Chen, Y. Huang, and Q. Cheng, “A comparative evaluation of biomedical similar article recommendation,” Journal of Biomedical Informatics, vol. 131, p. 104106, Jul. 2022, doi: 10.1016/j.jbi.2022.104106.

R. Upadhyay, P. K. Padhy, and P. K. Kankar, “A comparative study of feature ranking techniques for epileptic seizure detection using wavelet transform,” Computers & Electrical Engineering, vol. 53, pp. 163–176, Jul. 2016, doi: 10.1016/j.compeleceng.2016.05.016.

D. Xiong, Z. Zhang, T. Wang, and X. Wang, “A comparative study of multiple instance learning methods for cancer detection using T-cell receptor sequences,” Computational and Structural Biotechnology Journal, vol. 19, pp. 3255–3268, Jan. 2021, doi: 10.1016/j.csbj.2021.05.038.

Y. Wang et al., “A comparison of word embeddings for the biomedical natural language processing,” Journal of Biomedical Informatics, vol. 87, pp. 12–20, Nov. 2018, doi: 10.1016/j.jbi.2018.09.008.

Jahiruddin, M. Abulaish, and L. Dey, “A concept-driven biomedical knowledge extraction and visualization framework for conceptualization of text corpora,” Journal of Biomedical Informatics, vol. 43, no. 6, pp. 1020–1035, Dec. 2010, doi: 10.1016/j.jbi.2010.09.008.

B. G. Patra et al., “A content-based literature recommendation system for datasets to improve data reusability – A case study on Gene Expression Omnibus (GEO) datasets,” Journal of Biomedical Informatics, vol. 104, p. 103399, Apr. 2020, doi: 10.1016/j.jbi.2020.103399.

S. Raza, “A COVID-19 Search Engine (CO-SE) with Transformer-based architecture,” Healthcare Analytics, vol. 2, p. 100068, Nov. 2022, doi: 10.1016/j.health.2022.100068.

D. Ji, J. Gao, H. Fei, C. Teng, and Y. Ren, “A deep neural network model for speakers coreference resolution in legal texts,” Information Processing & Management, vol. 57, no. 6, p. 102365, Nov. 2020, doi: 10.1016/j.ipm.2020.102365.

D. Guo, G. Duan, Y. Yu, Y. Li, F.-X. Wu, and M. Li, “A disease inference method based on symptom extraction and bidirectional Long Short Term Memory networks,” Methods, vol. 173, pp. 75–82, Feb. 2020, doi: 10.1016/j.ymeth.2019.07.009.

P. Bota, A. Fred, J. Valente, C. Wang, and H. P. da Silva, “A dissimilarity-based approach to automatic classification of biosignal modalities,” Applied Soft Computing, vol. 115, p. 108203, Jan. 2022, doi: 10.1016/j.asoc.2021.108203.

W. Zheng et al., “A graph kernel based on context vectors for extracting drug–drug interactions,” Journal of Biomedical Informatics, vol. 61, pp. 34–43, Jun. 2016, doi: 10.1016/j.jbi.2016.03.014.

A. Duque, H. Fabregat, L. Araujo, and J. Martinez-Romo, “A keyphrase-based approach for interpretable ICD-10 code classification of Spanish medical reports,” Artificial Intelligence in Medicine, vol. 121, p. 102177, Nov. 2021, doi: 10.1016/j.artmed.2021.102177.

M. Fernández-Pichel, D. E. Losada, and J. C. Pichel, “A multistage retrieval system for health-related misinformation detection,” Engineering Applications of Artificial Intelligence, vol. 115, p. 105211, Oct. 2022, doi: 10.1016/j.engappai.2022.105211.

M. Lentschat, P. Buche, J. Dibie-Barthelemy, and M. Roche, “A new method to extract n-Ary relation instances from scientific documents,” Expert Systems with Applications, vol. 209, p. 118332, Dec. 2022, doi: 10.1016/j.eswa.2022.118332.

A. P. Kumar, A. Nayak, M. S. K., S. Goyal, and Chaitanya, “A novel approach to generate distractors for Multiple Choice Questions,” Expert Systems with Applications, vol. 225, p. 120022, Sep. 2023, doi: 10.1016/j.eswa.2023.120022.

T. Bikku and R. Paturi, “A novel somatic cancer gene-based biomedical document feature ranking and clustering model,” Informatics in Medicine Unlocked, vol. 16, p. 100188, Jan. 2019, doi: 10.1016/j.imu.2019.100188.

M. Sarrouti and S. Ouatik El Alaoui, “A passage retrieval method based on probabilistic information retrieval model and UMLS concepts in biomedical question answering,” Journal of Biomedical Informatics, vol. 68, pp. 96–103, Apr. 2017, doi: 10.1016/j.jbi.2017.03.001.

L. A. Quintero-Domínguez, C. Morell, and S. Ventura, “A propositionalization method of multi-relational data based on Grammar-Guided Genetic Programming,” Expert Systems with Applications, vol. 168, p. 114263, Apr. 2021, doi: 10.1016/j.eswa.2020.114263.

S. Cox et al., “A semantic similarity based methodology for predicting protein-protein interactions: Evaluation with P53-interacting kinases,” Journal of Biomedical Informatics, vol. 111, p. 103579, Nov. 2020, doi: 10.1016/j.jbi.2020.103579.

D. L. Rubin, C. F. Thorn, T. E. Klein, and R. B. Altman, “A Statistical Approach to Scanning the Biomedical Literature for Pharmacogenetics Knowledge,” Journal of the American Medical Informatics Association, vol. 12, no. 2, pp. 121–129, Mar. 2005, doi: 10.1197/jamia.M1640.

Q. Wang et al., “A study of entity-linking methods for normalizing Chinese diagnosis and procedure terms to ICD codes,” Journal of Biomedical Informatics, vol. 105, p. 103418, May 2020, doi: 10.1016/j.jbi.2020.103418.

C. Yan et al., “A survey of automated International Classification of Diseases coding: development, challenges, and applications,” Intelligent Medicine, vol. 2, no. 3, pp. 161–173, Aug. 2022, doi: 10.1016/j.imed.2022.03.003.

X. Han et al., “A survey of transformer-based multimodal pre-trained modals,” Neurocomputing, vol. 515, pp. 89–106, Jan. 2023, doi: 10.1016/j.neucom.2022.09.136.

Downloads

Published

25.12.2023

How to Cite

Golamari, J. M. ., & Haritha, D. . (2023). Biomedical Document Enhancement through Probabilistic Graph Clustering: Indexing and Key Phrase Mining . International Journal of Intelligent Systems and Applications in Engineering, 12(1), 543–551. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/3952

Issue

Section

Research Article