Network Traffic Classification via Kernel Based Extreme Learning Machine

The classification of data on the internet in order to make internet use more efficient has an important place especially for network administrators managing corporate networks. Studies for the classification of internet traffic have increased recently. By these studies, it is aimed to increase the quality of service on the network, use the network efficiently, create the service packages and offer them to the users. The first classification method used for the classification of the internet traffic was the classification for the use of port numbers. This classification method has already lost its validity although it was an effective and quick method of classification for the first usage times of the internet. Another classification method used for the classification of network traffic is called as load-based classification or deep packet analysis. This approach is based on the principle of classification by identifying signatures on packets flowing on the network. Another method of classification of the internet traffic which is commonly used in our day and has been also selected for this study is the kernel based on extreme learning machine based approaches. In this study, over 95% was achieved accuracies using different activation functions.


Introduction
Traffic classification methods are used to provide the efficient realization of the data traffic on network resources, to do user analysis by using network data, to manage and plan network resources, to detect the attacks and the abnormalities on the network [1,2].Recently, the network traffic classification has been frequently used in order to improve service quality in big networks [3,4], use the network effectively, develop new service packets and perform internet traffic analysis [5].Internet traffic analysis can be done both on-line and offline.In the on-line traffic analysis, each data packet on the network are captured and analysed [6,7].In the offline traffic analysis, network traffic flow is firstly captured and stored; then the stored flow is analysed and classified [8].In this paper, offline traffic analysis was performed.In the literature, three kind of classification technique, including port, payload and machine learning based, have been used.Port based classification is performed by comparing the port information retrieved from flow data with the port numbers of protocols determined by Internet Assigned Numbers Authority (IANA) [9].For instance; port 80 is used for http, and port 23 for telnet traffic.Especially with the widespread use of point to point (P2P) applications, this method has started to lose his functionality as some applications use non-standard port numbers to escape from firewall and network security tools and some use port hiding and dynamic port methods [10][11][12][13].Payload based classification is based on the principle of Internet traffic classification by analysing TCP/UDP packet loads.The analysis of loads is performed by determining whether the known applications contain characteristic signatures [10,14].When the packets are not encrypted, it works quite successfully.However, because of the following reasons, this classification technique is not much preferred today: • It causes privacy and security concerns, • Some applications communicate by using encrypted packets, • It can only make assessment based on the signatures which are experienced by the previous classification methods, • As it requires high processing and storage capacity, it is not suitable for real-time classification [11][12][13].The network classification method via machine learning algorithms is the most popular traffic classification method at the present time.In the studies, the classification is usually performed using supervised and unsupervised learning algorithms.Supervised learning algorithms perform classification by using the classification analysis methods in data mining; and unsupervised learning algorithms perform classification by using the clustering analysis methods.Machine learning algorithms perform the process of network traffic classification in two steps.In the first step, it forms a classification model; and in the second step, it performs the classification.Statistical methods and calculations are usually utilized when performing the classification process.Machine learning based classification method uses the following TCP and UDP statistical attributes of the flow during the flowbased classification: • Total size statistics, • Total number of forward and backward packets, • Total amount of forward and backward byte, • The transit time between packets, • And flow time.There are many studies on the field of Internet traffic classification by using machine learning.In their study, L. Yingqiu et al. tried to classify network traffic on the original and log-transformed data set by using K-means algorithm [13].In J. Erman's study, clustering algorithms of Autoclass, K-means and DBSCAN, were used [15].S. Zander performed network classifications by applying Exception Maximization (EM) algorithm and method of attribute selection on WAND Research Group's open data sets and different data sets [16].S. Zander et al. and S. Agrawal et al. made comprehensive comparisons about classification algorithms by using algorithms and attribute selection methods like C4.5, Bayes Net, Naïve Bayes [16][17][18][19][20]. Apart from port, payload, and machine learning based classifications in the literature, T. Karagiannis et al. developed an important classification method by using host service providers instead of TCP and UDP protocols [6].L. Bernaille et al.'s classification technique which was performed by using unsupervised learning method and checking only the first few TCP packets is among the important studies in the literature [22].Recently, ELM pattern recognition, which was suggested by Huang et al., has aroused great interest in the fields of machine learning and data mining; and a lot of applications have been performed regarding the issue [21,[23][24][25]35].ELM was suggested as a newly learning algorithm for Single-hidden Layer Feedforward Neural Networks (SLFNs) [23,27].During the learning process, SLFNs refreshes network loads based on the gradient.However, in ELM, input sizes and biases are random selected, and output sizes are calculated with an analytical method contrary to SLFNs.In this case, ELM gains the advantages of a fast learning process, a good generalization performance and a low computational load [27][28][29].A higher accuracy percentage was obtained with Tangent sigmoid and triangular basis among activation functions which were used for the classic ELM algorithms in this paper.With kernel based extreme learning machine algorithm, which used Radial basis and polynomial activation functions rather than classic ELM, the accuracy percentage was observed to be even higher.In this paper, Internet traffic classification process has been shown using KELM faster and high accuracy.
In the second part of this paper, working principles of ELM algorithms were mentioned.The third part of the paper mentioned how the data was obtained and which data were used for the classification in experimental studies.In addition, different classification algorithms were compared.

Proposed Methodology
Moore et al [34] used the data received from the Cambridge University campus in this study.The most important factor in this selection is to enable to make comparison of studies previously carried out by using these data with the methods used in this study.One of the important factors is the use of data having flows belonging to different classes.In addition, the use of existing data that everyone can reach will provide a basis to get more reliable results.

Extreme Learning Machine
Extreme Learning Machine was recommended in G.B. Huang et al. [28][29][30][31].ELM used from 2004 onwards for training the Singlehidden Layer Feedforward Neural Networks (SLFNs) [23,24].Dissimilar from the extensive understanding of training SLFNs, ELM employs randomize computational nodes in the hidden layer and computes its output weights analytically by solving a general linear system equation.Later, ELM theory was extended to the "generalized" SLFNs, where the hidden nodes need not be neuron alike.The generic architecture SLFNs shows in (Figure .1) If the SLFNs can proximate all the N samples without error, that There exist pairs of (  ,   ) and   such that: The above N equations can also be equivalently expressed in the compact matrix form where H is called the hidden-layer output matrix of the SLFN. is represents the output weight matrix, T is the matrix that consists of output labels for the N data samples.
To train a SLFNs as mentioned in (2), it is equivalent to finding the least-square solution  ̂ of linear system (5), that is: In the case that the number of hidden nodes  is coequal to the number of different training samples N , it is possible to find a  ˆ such that the training error reaches zero.The hidden layer output H is an invertible square matrix.Hence the solution of the linear system can be given as: In the case that the number of hidden nodes  is less than the number of distinct training samples N , to achieve the smallest training error

T H  
, the solution of the linear system (5) can be obtained as: Where † H is called the Moore-Penrose universalized inverse [32].The least squares solution of (8) based on Karush-Kuhn-Tucker (KKT) conditions can be written as where H is the hidden layer output matrix, C is the regulation coefficient, and T is the expected output matrix of samples.Then, the output function of the ELM learning algorithm is If the feature mapping ) (x h is unknown and the kernel matrix of ELM based on Mercer's conditions can be defined as follows [33]: ), , ( ) ( ) ( : Thus, the output function ) (x f of the kernel based ELM can be written compactly as

Results and Discussion
12 features were chosen from output data.The chosen features and their explanations are illustrated in (Table.1).Variable data wire (from server to client) Among the classes belonging to the flows, 7 most commonly used ones were chosen; and 1000 learning processes and 750 tests were applied to each classes.Totally, 7000 classes were chosen for the learning process and 5250 for the test.Chosen classes are illustrated in (Table .2).The success measurement of classification by machine learning algorithms can be examined according to the evaluation table in (Figure .2) including confusion matrix for classification algorithms, and the evaluation metrics [7].

Conclusion
The classification of data on the internet in order to make internet use more efficient has an important place especially for network administrators managing corporate networks.Studies for the classification of internet traffic have increased recently.Machine learning methods of classification of the internet traffic which is commonly used in our day and has been also selected for this study is the kernel based on extreme learning machine based approaches.In this study, over 95% was achieved accuracies using different

Figure. 1 .
Figure. 1. Single-Hidden layer feed forward network architecture function of hidden neurons of single hidden layer feed-forward neural networks.There are many kernel functions satisfying the Mercer's condition available from the existing literature, such as linear kernel, polynomial kernel, Gaussian kernel, and exponential kernel.In this paper, we use Radial Basis (RBF) and Polynomial kernel function for performance analysis.

Figure. 2 .
Figure. 2. Evaluation metrics In (Figure.2),lines indicate the actual value of the example; and columns of matrix indicate estimated values which were classified or clustered.Accordingly, above-mentioned metrics can be defined as follows: • True Positive (TP): The number of examples which actually belong to class X and are correctly estimated to be in class X. • False Positive (FP): The number of examples which don't belong to class X but estimated to be in Class X. • True Negative (TN): The number of examples which actually don't belong to class X and estimated not to be in class X. • False Negative (FN): The number of examples which belong to class X but estimated not to be in class X 3 & 4).
activation functions.In this study, the kernel based ELM function; Radial Basis and Polynomial functions are used.Functions classification performance by changing the parameters used were observed.RBF is used for the first parameter value.Polynomial function for changing the classification is made and 2 parameter values.With a decrease of the parameter value used for RBF was found that the accuracy increases.This parameter value is 0.01 and reached a value of 95.10% accuracy.With 0,001 of these parameters have reached a value of 96.27% accuracy.But the increase has been observed that a lot of work time. 2 parameter value for Polynomial function is used.The value of 1 and 10 Accuracy rate of 93.07%was observed with the election.

Table 1 .
Features and explanation

Table 2 .
Classes and data sets

Table 3 .
Kernel radial basis function

Table 4 .
Kernel polynomial function