Classification of DNA sequences: Analysis of performance of SVM using a novel integral kernel function

Authors

  • Mahesha Y, Vishwanath M K, Nagaraju C, Ravindra S

Keywords:

DNA, K-mer, Machine learning, RBF, Polynomial, Integral Kernel

Abstract

The DNA sequence classification plays a vital role in bioinformatics to categorize the unknown DNA sequence. In the present article a new approach for DNA classification has been proposed. DNA sequences belong to six different classes are put under the light of present research for classification. A popular classification model SVM has been analysed for exploring the insights of it. Two kernel functions namely Radial Basis and Polynomial are kept under deep analysis for integrating these kernels. These two kernel functions are successfully integrated. An experiment has been carried out by integrating Radial Basis Function with Polynomial kernel. The performance has been measured using metrics such as Precision, Recall, F1-score, and Accuracy. The performance of the methods adopted in the present research has also been shown using Precision-Recall curve and ROC curve. The individual accuracy achieved by Radial Basis Function and Polynomial function is 80.6% and 82.7% respectively. The proposed novel model has achieved accuracy of 98.4%. The result clearly shows that the proposed integral kernel has outperformed the Radial Basis and Polynomial functions.

Downloads

Download data is not yet available.

References

Alberts B, Johnson A, Lewis J, et al. Molecular Biology of the Cell,4th edition, New York: Garland Science, 2002, The Structure and Function of DNA.

Ghannam JY, Wang J, Jan A. Biochemistry, DNA Structure, In: StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing; 2022.

Brown TA. Genomes. 2nd edition. Oxford: Wiley-Liss; 2002. Chapter 7, Understanding a Genome Sequence

Mahesha Y, Nagaraju C , A literature review on analysis of palm patterns to detect congenital heart diseases, Biomed Eng Appl Basis Commun 32, 2020, https://doi.org/10.4015/S101623722050012

Y. Mahesha and C. Nagaraju, "Automating the Identification and Evaluation of the Position of Axial Triradius on Palm Print: An Approach to Early Detection of Congenital Heart Diseases", Biomedical Engineering: Applications Basis and Communications, vol. 33, no. 2, pp. 2150021, Apr. 2021.

Mahesha, Y., Nagaraju, C. (2023). Analysis of Axial Triradius to Detect Congenital Heart Diseases. In: Ranganathan, G., Bestak, R., Fernando, X. (eds) Pervasive Computing and Social Networking. Lecture Notes in Networks and Systems, vol 475. Springer, Singapore. https://doi.org/10.1007/978-981-19-2840- 6_2

Mahesha Y and Nagaraju C, “Principal Component Analysis and Local Binary Patterns: A comparative study using different databases”, International Research Journal of Modernization in Engineering Technology and Sciences”, 5(11), 2023. https://www.doi.org/10.56726/IRJMETS46856

Yang A, Zhang W, Wang J, Yang K, Han Y and Zhang L (2020) Review on the Application of achine Learning Algorithms in the Sequence Data Mining of DNA. Front. Bioeng. Biotechnol. 8:1032

Warjurkar, S. V. ., & Ridhorkar, S. . (2024). Maximizing Precision in Early Prognosis using SVM-ACO Classifier and Hybrid Optimization Techniques in MRI Brain Tumor Segmentation with Integration of Multi-Modal Imaging Data. International Journal of Intelligent Systems and Applications in Engineering, 12(10s), 389–401.

Kumar, A. ., Gaur, N. ., & Nanthaamornphong, A. . (2024). Intelligent Signal Identification of NOMA Signal with 256-QAM Modulation Using SVM Algorithm. International Journal of Intelligent Systems and Applications in Engineering, 12(13s), 257–264.

Levy, S., Stormo, G.D. (1997). DNA sequence classification using DAWGs. In: Mycielski,J., zenberg, G., Salomaa, A. (eds) Structures in Logic and Computer Science. Lecture Notes in Computer Science, vol 1261. Springer, Berlin, Heidelberg.

H.-M. Müller, S.E. Koonin, “Vector space classification of DNA sequences”, Journal of Theoretical Biology, 223(2), 2003, pp. 161-169.

Ranawana, R., Palade, V. A neural network based multi-classifier system for gene identification in DNA sequences. Neural Comput & Applic 14, 122–131 (2005).

Samia M. Abd –Alhalem, El-Sayed M. El-Rabaie, Naglaa. F. Soliman, Salah Eldin S. E. Abdulrahman, Nabil A. Ismail and Fathi E. Abd El-samie, DNA Sequence Classification with Deep Learning: A Survey, 2020

Zaki MJ, Carothers CD and Szymanski BK, “ VOGUE: A Variable Order Hidden Markov Model with Duration Based on Frequent Sequence Mining”, ACM Transactions on Knowledge discovery from Data, 4(1), 2010.

Y. Mahesha and C. Nagaraju, “Spotting congenital heart diseases using palm print based on faster R-CNN and spatial method, International Journal of Medical Engineering and Informatics 2024 16:1, 56-70. https://doi.org/10.1504/IJMEI.2024.135685.

Y. Mahesha and C. Nagaraju, “Machine learning approach to detect congenital heart diseases using palmar dermatoglyphics”, International Journal of Medical Engineering and Informatics 2023 15:4, 336- 351, https://doi.org/10.1504/IJMEI.2023.132575

Mahesha, Y. (2023). Identification of Brain Tumor Images Using a Novel Machine Learning Model. In: Ranganathan, G., Papakostas, G.A., Rocha, Á. (eds) Inventive Communication and Computational Technologies. ICICCT 2023. Lecture Notes in Networks and Systems, vol 757. Springer, Singapore. https://doi.org/10.1007/978-981-99-5166-6_30

M. Y and N. C, "Machine Learning Approach to Detect Congenital Heart Diseases using Angle at Axial Triradius," 2021 IEEE Mysore Sub Section International Conference (MysuruCon), Hassan, India, 2021, pp. 220-226, doi: 10.1109/MysuruCon52639.2021.9641585.

Le NQK, Yapp EKY, Nagasundaram N, Yeh HY. Classifying Promoters by Interpreting the Hidden Information of DNA Sequences via Deep Learning and Combination of Continuous FastText N-Grams. Front Bioeng Biotechnol. 2019 Nov 5;7:305.

Di Gangi M, Lo Bosco G, Rizzo R. Deep learning architectures for prediction of nucleosome positioning from sequences data. BMC Bioinformatics. 2018 Nov 20;19(Suppl 14):418

Hui Cao, Takashi Naito and Yoshiki Ninomiya, "Approximate RBF Kernel SVM and Its Applications in Pedestrian Classifcation", Low-Power High-Speed ADCs for Nanometer CMOS Integration, 2008.

Rikard Vinge and Tomas Mckelvey, “Understanding Support Vector Machine with Polynomial Kernels”, 2019 27th European Signal Processing Conference (EUSIPCO), A Coruna, Spain, 2019, pp. 1-5

Periwal N, Sharma P, Arora P, Pandey S, Kaur B and Sood V, “A novel binary k-mer approach for classification of coding and non-coding RNAs across diverse species”, Biochimie, 199, pp. 112-122, 2022.

Orozco-Arias S, Candamil-Cortés MS, Jaimes PA, Piña JS, Tabares-Soto R, Guyot R, Isaza G. K-mer-based machine learning method to classify LTR-retrotransposons in plant genomes. PeerJ. 2021 May 19;9:e11456

Downloads

Published

24.03.2024

How to Cite

Mahesha Y. (2024). Classification of DNA sequences: Analysis of performance of SVM using a novel integral kernel function. International Journal of Intelligent Systems and Applications in Engineering, 12(3), 3737–3743. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/6050

Issue

Section

Research Article