Clustering of Mitochondrial D-loop Sequences Using Similarity Matrix, PCA and K-means Algorithm

  • Can Eyüpoğlu
Keywords: Clustering, p-distance, PCA, Jukes-Cantor, K-means algorithm, Similarity matrix


In this study, mitochondrial displacement-loop (D-loop) sequences isolated from different hominid species are clustered using similarity matrix, Principal Component Analysis (PCA) and K-means algorithm. Firstly, the mitochondrial D-loop sequence data are retrieved from the GenBank database and copied into MATLAB. Pairwise distances are computed using p distance and Jukes-Cantor methods. A phylogenetic tree is created and then a similarity matrix is generated according to the pairwise distances. Furthermore, the clustering is performed using only K-means algorithm. After that PCA and K-means are used together in order to cluster mitochondrial D-loop sequences.


C. Eyüpoğlu, “Clustering of Mitochondrial D-loop Sequences Using Similarity Matrix, PCA and K-means Algorithm”, IJISAE, pp. 244-248, Dec. 2016.
