Clustering of Mitochondrial D-loop Sequences Using Similarity Matrix, PCA and K-means Algorithm

  • Can Eyüpoğlu
Keywords: Clustering, p-distance, PCA, Jukes-Cantor, K-means algorithm, Similarity matrix


In this study, mitochondrial displacement-loop (D-loop) sequences isolated from different hominid species are clustered using similarity matrix, Principal Component Analysis (PCA) and K-means algorithm. Firstly, the mitochondrial D-loop sequence data are retrieved from the GenBank database and copied into MATLAB. Pairwise distances are computed using p distance and Jukes-Cantor methods. A phylogenetic tree is created and then a similarity matrix is generated according to the pairwise distances. Furthermore, the clustering is performed using only K-means algorithm. After that PCA and K-means are used together in order to cluster mitochondrial D-loop sequences.


Download data is not yet available.


H. Zischler, H. Geisert, A. Von Haeseler, and S. Pääbo, “A nuclear 'fossil' of the mitochondrial D-loop and the origin of modern humans,” Nature, vol. 378, no. 6556, pp. 489–492, November 1995.

W. M. Brown, E. M. Prager, A. Wang, and A. C. Wilson, “Mitochondrial DNA sequences of primates: tempo and mode of evolution,” Journal of Molecular Evolution, vol. 18, no. 4, pp. 225–239, July 1982.

D. R. Maddison, M. Ruvolo, and D. L. Swofford, “Geographic origins of human mitochondrial DNA phylogenetic inference from control region sequences,” Systematic Biology, vol. 41, no. 1, pp. 111−124, 1992.

A. R. Hoelzel, J. M. Hancock, and G. A. Dover, “Evolution of the Cetacean Mitochondrial D-Loop Region,” Molecular Biology and Evolution, vol. 8, no. 3, pp. 475−493, 1991.

W. M. Brown, “The mitochondrial genome of animals,” MacIntyre RJ (ed) Molecular Evolutionary Genetics, Plenum Press, New York, pp. 95−130, 1985.

A. C. Wilson, R. L. Cann, S. M. Carr, M. George, U. B. Gyllensten, K. M. Helm-Bychowski, R. G. Higuchi, S. R. Palumbi, E. M. Prager, R. D. Sage, and M. Stoneking, “Mitochondrial DNA and two perspectives on evolutionary genetics,” Biological Journal of the Linnean Society, vol. 26, no. 4, pp. 375−400, December 1985.

W. B. Upholt and I. B. Dawid, “Mapping of mitochondrial DNA of individual sheep and goats: rapid evolution in the D loop region,” Cell, vol. 11, no. 3, pp. 571−583, July 1977.

M. W. Walberg and D. A. Clayton, “Sequence and properties of the human KB cell and mouse L cell D-loop regions of mitochondrial DNA,” Nucleic Acids Research, vol. 9, no. 20, pp. 5411−5421, October 1981.

D. Chang and D. A. Clayton, “Priming of human mitochondrial DNA replication occurs at the light-strand promoter,” Proceedings of the National Academy of Sciences of the United States of America, vol. 82, no. 2, pp. 351−355, January 1985.

C. Eyupoglu, “Implementation of Color Face Recognition Using PCA and k-NN Classifier,” 2016 IEEE NW Russia Young Researchers in Electrical and Electronic Engineering Conference (ElConRusNW), pp. 199−202, St. Petersburg, Russia, 2–3 February 2016.

X. Xiang, J. Yang, and Q. Chen, “Color face recognition by PCA-like approach,” Neurocomputing, vol. 152, pp. 231−235, March 2015.

D. Wei and Q. Jiang, “A DNA Sequence Distance Measure Approach for Phylogenetic Tree Construction,” 2010 IEEE Fifth International Conference Bio-Inspired Computing: Theories and Applications (BIC-TA), pp. 204−212, Changsha, 23–26 September 2010.

P. Bhambri and O. P. Gupta, “Development of Phylogenetic Tree Based on Kimura's Method,” 2012 2nd IEEE International Conference on Parallel Distributed and Grid Computing (PDGC), pp. 721–723, Solan, 6–8 December 2012.

S. S. Patil, V. Kumar, V. R. Pai, and A. K. Patil, “Constructing phylogenetic tree and analysis using information retrieval approach for MYB tfr's of rice genome,” 2015 IEEE International WIE Conference on Electrical and Computer Engineering (WIECON-ECE), pp. 523–529, Dhaka, 19–20 December 2015.

J. MacQueen, “Some methods for classification and analysis of multivariate observations,” Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297, Berkeley, University of California Press, 1967.

W. K. Daniel Pun and A. B. M. Shawkat Ali, “Unique Distance Measure Approach for K-means (UDMA-Km) Clustering Algorithm,” 2007 IEEE Region 10 Conference (TENCON), pp. 1–4, Taipei, 30 October–2 November 2007.

How to Cite
C. Eyüpoğlu, “Clustering of Mitochondrial D-loop Sequences Using Similarity Matrix, PCA and K-means Algorithm”, IJISAE, pp. 244-248, Dec. 2016.
Research Article