Proposal of Machine Learning Approach for Identification of Instant Messaging Applications in Raw Network Traffic
DOI:
https://doi.org/10.18201/ijisae.2018642060Keywords:
Encrypted Traffic Identification, Network flow, Security, Machine Learning, Network ForensicsAbstract
Identification of Internet protocol from either raw network traffic or either network flows plays a crucial role at maintaining and improving the security of computer systems. A significant amount of research is carried out while exploiting a variety of identification techniques. Although certain level in success at detection of network protocols for unencrypted traffic has been achieved, accuracy and performance is rather poor for encrypted traffic. Considering technological trends, new and existing applications have been adopted to use encryption mechanism to protect information and privacy. Therefore, classification of encrypted network traffic is mandatory for ensuring security. Moreover, while performing network forensic investigation, labelling of network protocols/applications is a must to accomplish. In this study, we propose a method to automatically identify instant messaging applications from raw network traffic. To this end, we first extract flow based static features from network capture and then apply machine learning algorithms. The proposed method is evaluated with fairly large dataset. The dataset compromise of publicly available NISM dataset and the network traffic of 9 popular instant messaging applications collected in a controlled environment. The dataset overall contains 716607network flows belonging to 20 application categories. The proposed method classifies network flows of instant messaging applications into their corresponding application categories with the accuracy over 0.99 and F1-score of 0.99.Downloads
References
A. W. Moore and D. Zuev, “Internet traffic classification using bayesian analysis techniques,” ACM SIGMETRICS Performance Evaluation Review., vol. 33, pp. 50-60, 2005.
C. V Wright, F. Monrose, and G. M. Masson, “On inferring application protocol behaviors in encrypted network traffic,” Journal of Machine Learning Research, vol. 7, pp. 2745-2769, 2006.
R. Alshammari and A. N. Zincir-Heywood, “Machine learning based encrypted traffic classification: Identifying ssh and skype”, CISDA, vol. 9, pp. 289-296, 2009.
R. Alshammari and A. N. Zincir-Heywood, “Can encrypted traffic be identified without port numbers, IP addresses and payload inspection?” Computer networks, vol. 55, no.6, pp. 1326-1350, 2011.
Calculating Flow Statistics Using NetMate, 2017. [Online], Available: https://dan.arndt.ca/nims/calculating-flow-statistics-using-netmate/ . Accessed on: Jan15, 2017.
D. J. Arndt and A N. Zincir-Heywood, “A comparison of three machine learning techniques for encrypted network traffic analysis,” In Proc. IEEE Symposium on Computational Intelligence for Security and Defense Applications (CISDA), 2011, pp. 107-114.
Y. Okada, S. Ata, N. Nakamura, Y. Nakahira, and I. Oka, “Comparisons of machine learning algorithms for application identification of encrypted traffic,”. In Proc. Machine Learning and Applications and Workshops (ICMLA), 2011, pp. 358-361.
Github repo containing the source code and the dataset of this work, 2017, [Online], Available: https://gitlab.com/apektas/instant_messaging_app_identification. Accessed on: Feb-12, 2017.
P. Geurts, D. Ernst, and L. Wehenkel, “Extremely randomized trees,” Machine learning, vol. 63, no. 1, pp. 3-42, 2006.
NIMS1 data set, 2017, [Online], Available: https://projects.cs.dal.ca/projectx/data/NIMS.arff.zip. Accessed on: Jan-15, -2017.
H. Yu, F. Huang, and C. Lin, “Dual coordinate descent methods for logistic regression and maximum entropy models,” Machine Learning, vol. 85, no.1, pp.41-75, 2011.
M. Schmidt, N. L. Roux, and F. Bach, “Minimizing finite sums with the stochastic average gradient,” Mathematical Programming, pp. 1-30, 2013.
T. Wu, C. Lin, and R. C. Weng, “Probability estimates for multiclass classification by pairwise coupling,” Journal of Machine Learning Research, vol. 5, pp.975-1005, 2004.
L. Breiman,. “Random forests,” Machine learning, vol. 45, no. 1, pp. 5-32, 2001.
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, “Scikit-learn: Machine learning in python,”. Journal of Machine Learning Research, vol. 12, pp. 2825-2830, 2011.
Scikit-learn: machine learning in Python, 2017, [Online], Available: http://scikit-learn.org/stable/index.html, Accessed on: Mar-15, 2017.
Downloads
Published
How to Cite
Issue
Section
License
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.