Biomedical Document Enhancement through Probabilistic Graph Clustering: Indexing and Key Phrase Mining
Keywords:
biomedical documents, gene data, feature extraction, document mappingAbstract
As the size of biomedical databases is increasing along with genes and diseases, finding key feature sets is complex due to large data sizes and sparsity problems. The process of extracting relevant biomedical features from documents, ranking them based on their probability, and clustering them is a crucial aspect of biomedical document feature extraction, ranking, and classification. To extract gene/protein features from pre-processed documents, the Abner tagger is used, and the highest probability biomedical features are identified for the graph initialization process. The graph-based clustering is performed based on the relationship between gene/protein terms in the ranked document set. To enhance the quality of the cluster, a novel graph similarity measure is employed, which maximizes the probabilistic entropy measure and prioritizes gene-based ranked document clustering. Experimental results prove that the proposed model has better improvement over the conventional models.
Downloads
References
L. Zhang, W. Lu, H. Chen, Y. Huang, and Q. Cheng, “A comparative evaluation of biomedical similar article recommendation,” Journal of Biomedical Informatics, vol. 131, p. 104106, Jul. 2022, doi: 10.1016/j.jbi.2022.104106.
R. Upadhyay, P. K. Padhy, and P. K. Kankar, “A comparative study of feature ranking techniques for epileptic seizure detection using wavelet transform,” Computers & Electrical Engineering, vol. 53, pp. 163–176, Jul. 2016, doi: 10.1016/j.compeleceng.2016.05.016.
D. Xiong, Z. Zhang, T. Wang, and X. Wang, “A comparative study of multiple instance learning methods for cancer detection using T-cell receptor sequences,” Computational and Structural Biotechnology Journal, vol. 19, pp. 3255–3268, Jan. 2021, doi: 10.1016/j.csbj.2021.05.038.
Y. Wang et al., “A comparison of word embeddings for the biomedical natural language processing,” Journal of Biomedical Informatics, vol. 87, pp. 12–20, Nov. 2018, doi: 10.1016/j.jbi.2018.09.008.
Jahiruddin, M. Abulaish, and L. Dey, “A concept-driven biomedical knowledge extraction and visualization framework for conceptualization of text corpora,” Journal of Biomedical Informatics, vol. 43, no. 6, pp. 1020–1035, Dec. 2010, doi: 10.1016/j.jbi.2010.09.008.
B. G. Patra et al., “A content-based literature recommendation system for datasets to improve data reusability – A case study on Gene Expression Omnibus (GEO) datasets,” Journal of Biomedical Informatics, vol. 104, p. 103399, Apr. 2020, doi: 10.1016/j.jbi.2020.103399.
S. Raza, “A COVID-19 Search Engine (CO-SE) with Transformer-based architecture,” Healthcare Analytics, vol. 2, p. 100068, Nov. 2022, doi: 10.1016/j.health.2022.100068.
D. Ji, J. Gao, H. Fei, C. Teng, and Y. Ren, “A deep neural network model for speakers coreference resolution in legal texts,” Information Processing & Management, vol. 57, no. 6, p. 102365, Nov. 2020, doi: 10.1016/j.ipm.2020.102365.
D. Guo, G. Duan, Y. Yu, Y. Li, F.-X. Wu, and M. Li, “A disease inference method based on symptom extraction and bidirectional Long Short Term Memory networks,” Methods, vol. 173, pp. 75–82, Feb. 2020, doi: 10.1016/j.ymeth.2019.07.009.
P. Bota, A. Fred, J. Valente, C. Wang, and H. P. da Silva, “A dissimilarity-based approach to automatic classification of biosignal modalities,” Applied Soft Computing, vol. 115, p. 108203, Jan. 2022, doi: 10.1016/j.asoc.2021.108203.
W. Zheng et al., “A graph kernel based on context vectors for extracting drug–drug interactions,” Journal of Biomedical Informatics, vol. 61, pp. 34–43, Jun. 2016, doi: 10.1016/j.jbi.2016.03.014.
A. Duque, H. Fabregat, L. Araujo, and J. Martinez-Romo, “A keyphrase-based approach for interpretable ICD-10 code classification of Spanish medical reports,” Artificial Intelligence in Medicine, vol. 121, p. 102177, Nov. 2021, doi: 10.1016/j.artmed.2021.102177.
M. Fernández-Pichel, D. E. Losada, and J. C. Pichel, “A multistage retrieval system for health-related misinformation detection,” Engineering Applications of Artificial Intelligence, vol. 115, p. 105211, Oct. 2022, doi: 10.1016/j.engappai.2022.105211.
M. Lentschat, P. Buche, J. Dibie-Barthelemy, and M. Roche, “A new method to extract n-Ary relation instances from scientific documents,” Expert Systems with Applications, vol. 209, p. 118332, Dec. 2022, doi: 10.1016/j.eswa.2022.118332.
A. P. Kumar, A. Nayak, M. S. K., S. Goyal, and Chaitanya, “A novel approach to generate distractors for Multiple Choice Questions,” Expert Systems with Applications, vol. 225, p. 120022, Sep. 2023, doi: 10.1016/j.eswa.2023.120022.
T. Bikku and R. Paturi, “A novel somatic cancer gene-based biomedical document feature ranking and clustering model,” Informatics in Medicine Unlocked, vol. 16, p. 100188, Jan. 2019, doi: 10.1016/j.imu.2019.100188.
M. Sarrouti and S. Ouatik El Alaoui, “A passage retrieval method based on probabilistic information retrieval model and UMLS concepts in biomedical question answering,” Journal of Biomedical Informatics, vol. 68, pp. 96–103, Apr. 2017, doi: 10.1016/j.jbi.2017.03.001.
L. A. Quintero-Domínguez, C. Morell, and S. Ventura, “A propositionalization method of multi-relational data based on Grammar-Guided Genetic Programming,” Expert Systems with Applications, vol. 168, p. 114263, Apr. 2021, doi: 10.1016/j.eswa.2020.114263.
S. Cox et al., “A semantic similarity based methodology for predicting protein-protein interactions: Evaluation with P53-interacting kinases,” Journal of Biomedical Informatics, vol. 111, p. 103579, Nov. 2020, doi: 10.1016/j.jbi.2020.103579.
D. L. Rubin, C. F. Thorn, T. E. Klein, and R. B. Altman, “A Statistical Approach to Scanning the Biomedical Literature for Pharmacogenetics Knowledge,” Journal of the American Medical Informatics Association, vol. 12, no. 2, pp. 121–129, Mar. 2005, doi: 10.1197/jamia.M1640.
Q. Wang et al., “A study of entity-linking methods for normalizing Chinese diagnosis and procedure terms to ICD codes,” Journal of Biomedical Informatics, vol. 105, p. 103418, May 2020, doi: 10.1016/j.jbi.2020.103418.
C. Yan et al., “A survey of automated International Classification of Diseases coding: development, challenges, and applications,” Intelligent Medicine, vol. 2, no. 3, pp. 161–173, Aug. 2022, doi: 10.1016/j.imed.2022.03.003.
X. Han et al., “A survey of transformer-based multimodal pre-trained modals,” Neurocomputing, vol. 515, pp. 89–106, Jan. 2023, doi: 10.1016/j.neucom.2022.09.136.
Downloads
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.