Breast Cancer Diagnosis by Different Machine Learning Methods Using Blood Analysis Data

: Today, one of the most common types of cancer is breast cancer. It is crucial to prevent the propagation of malign cells to reduce the rate of cancer induced mortality. Cancer detection must be done as early as possible for this purpose. Machine Learning techniques are used to diagnose or predict the success of treatment in medicine. In this study, four different machine learning algorithms were used to early detection of breast cancer. The aim of this study is to process the results of routine blood analysis with different ML methods and to understand how effective these methods are for detection. Methods used can be listed as Artificial Neural Network (ANN), standard Extreme Learning Machine (ELM), Support Vector Machine (SVM) and K-Nearest Neighbor (k-NN). Dataset used were taken from UCI library. In this dataset age, body mass index (BMI), glucose, insulin, homeostasis model assessment (HOMA), leptin, adiponectin, resistin and chemokine monocyte chemoattractant protein 1 (MCP1) attributes were used. Parameters that have the best accuracy values were found by using four different Machine Learning techniques. For this purpose, hyperparameter optimization method was used. In the end, the results were compared and discussed.


Introduction
Different cancer types have long been a major threat to human life [1].Among these types, breast cancer has a high mortality rate in women.Unfortunately, this rate is increasing in developed countries day by day [2,3].Moreover, breast cancer is the second biggest cause of death all over the world [4].According to World Health Organization (WHO) data [5], breast cancer has been detected in 25% of women in the United Nations [6].16% of all female cancers is breast cancer [5].Cancer is a sickness that starts in the cell and spreads into the other part of the body [7].That is the reason why early detection is crucial to prevent before it spreads.Early diagnosis of breast cancer is the most important and difficult part of breast imaging [8].The works for early detection of breast cancer are not new but the current works are not capable enough for early detection so, in addition to current works, scientist are searching for new methods [9].Specially Computer-Aided Detection (CAD) systems play a crucial role in early detection [10].Machine Learning (ML) techniques are used in the CAD system applications.ML [11] is an Artificial Intelligence (AI) topic that enables the machines to learn a special task by experience.In recent years, ML methods have become widespread in predicting and detecting applications in order to make strong decisions in recent years.For example, ML methods can be used to determine whether a cancer is benign or malign [9].

Related Works
There are many works exist for the detection of breast cancer using ML techniques in the literature.In this part, some of these works were shown.The performance comparison of Support Vector Machine (SVM), K-Nearest Neighbor (k-NN), Decision Tree (C4.5) and Naive Bayes (NB) Machine Learning (ML) techniques were shown [12].Wisconsin Diagnosis Breast Cancer (WDBC) dataset [13] was used for this work.The best result was obtained with SVM technique as 97.13%.The paper with [14] reference number includes a work that K-Means and SVM algorithms were used as a hybrid for the purpose of detection of a tumour.A classification with a high accuracy rate was performed as a result of 10 times cross-validation.In this work, the WDBC dataset [13] was used.97.38% accuracy was achieved.The paper [15] shows the success of SVM and Artificial Neural Network (ANN) techniques together.WDBC dataset was also used in this paper.Accuracy was obtained as 97.14% with SVM and 96.71% with ANN.According to these results, SVM gives better results than ANN.Another paper [16] it was also shown that SVM has better performance for the detection of breast cancer.On the other hand, the performance of the SVM depends on the kernel function.In this paper, the performance of different types of kernel functions were compared.In the paper [17], k-NN algorithm was optimized for a faster and more reliable classification.94.1% accuracy was obtained.The paper [18] is about the usage of different ML techniques for breast cancer.The research is about the usage of ANN, SVM, Decision Tree (DT) and k-NN techniques in breast cancer diagnosis.In the paper [19] DT, Bayesian Belief Network, and SVM techniques were compared.In the last paper [20], breast cancer was detected using ANN classification.The work focuses on the optimal activation function that minimizes the classification error by using fewer blocks.

Material and Methods
In this section, dataset and ML methods used were presented.

Data Understanding
When related works were analysed, it is clear that there are several different techniques for the detection of breast cancer and detection problem still exist.There are several types of the dataset for the detection of breast cancer.In this paper, Breast Cancer Coimbra dataset [22] taken from UCI [21] ML Repository was used.This dataset includes features that can be collected in routine blood analysis.These features are age (years), BMI (kg/m2), Glucose (mg/dL), Insulin (µU/mL), HOMA, Leptin (ng/mL), Adiponectin (µg/mL), Resistin (ng/mL) and MCP1(pg/dL).According to these input features, target data can be classified as healthy or unhealthy.These features were measured from 64 patients with breast cancer and 52 healthy people [22,23].This dataset differs from others in terms of the features it contains.

Artificial Neural Network (ANN)
The structure of the ANN is quite similar to biological neural networks [24].An ANN composed of three layers as an input layer, hidden layer, and an output layer.The neurons in each layer are connected to each other with a specific weight.These weights are updated themselves iteratively until they are close enough to target values.When the weights are tuned, the system can be expressed as trained.After this phase, the testing process can be performed [25].

Extreme Learning Machine (ELM)
ELM is a method invented by Huang and friends [26].It is actually having the same structure with ANN.The difference is that while ANN has more than one hidden layer, standard ELM should have only one hidden layer.Moreover, Unlike ANN, there are more than 1000 hidden layer neurons in a standard ELM [27].ELM offers advantages over other ML methods in terms of speed.Because ELM completes training with a single iteration [28].Weights are assigned randomly and according to target values, β values are calculated.Moore-Penrose generalized inverse matrix method is used for the calculation [29].

Support Vector Machine (SVM)
SVM is one of the best advised ML methods in terms of speed and accuracy [30].SVM forms optimal hyperplanes in a multidimensional plane and in this way classifies multi-class property data [31].The SVM contains calculations for the creation of this plane.If the properties can be classified as linear, the plane can be created by simple calculations.The kernel trick is used for non-linear features.With kernel trick, features can be converted to a higher-level and can be separated linearly [32].

K-Nearest Neighbors (k-NN)
With the help of k-NN, data in the feature space are classified according to distance.The distances can be calculated with different methods.In order to classify the data, the decision is made by looking at the distance from k numbered neighbor.Data is assigned to the nearest class [33].Since there is no training phase, understanding and implementation of the method are quite easy [34].

Application and Results
In this study, the dataset was taken from the UCI library [21] and blood analysis data taken from the paper [22] were used.There are 116 samples in total.Some of these data are shown in Table 1.
Target values indicate that the person is healthy or unhealthy.Considering the input values, max and min of these values are quite different from each other.Normalization must first be applied to normalize the distribution and increase the success rate.Feature Scaling method is used for normalization.The formula for this method is shown in (1).
After normalization using (1), training and test data were generated randomly from the data.80% percent of the whole data were used in the test phase and 20% percent were used in the training phase.
After separation of training and test data, results were obtained for each ML method.
An interface was created in MATLAB GUI environment for classification with ANN (see Fig. 1).In ANN, there are a number of hyperparameters that affect the accuracy of the system.The important parameters can be listed as Number of Hidden Layer Neuron, Epoch Number, Learning Rate and Momentum Coefficient.These values must be set by trial and error to obtain the most optimal result of ANN.For this reason, at the interface, a certain range of these parameters can be adjusted by the user.The graphs in the interface give Root Mean Square Error (RMSE) values according to the changing parameter values.The results of the RMSE values are plotted according to the changed parameter and the parameters which give the minimum error were recorded.After that, training and test process was managed by using the best parameters.As a result, the average test accuracy rate is 79.4304%, average training time 0.4282 second and average RMSE value was obtained as 0.3954.Comparison of these values with ELM is shown in Table 2.An interface was created in the MATLAB GUI environment for standard ELM classification (see Fig. 2).In standard ELM, the hyperparameter that affects the accuracy of the system is the number of hidden layer neuron.The number of hidden layer neurons is changed within a certain range to achieve the most optimal result with ELM.This range can be determined by the user.RMSE values are plotted according to the changed parameter.The best number of hidden neuron layers was obtained as 1800 (see Fig. 2).As a result, the average test accuracy rate is 80%, average training time is 0.0075 second and average RMSE value was obtained as 0.4755.Comparison of these values with ANN is shown in Table 2.

Fig. 2. Designed ELM interface and results
Table 2 shows that the accuracy values of ANN and standard ELM are close to each other.But ELM is much faster than ANN.When number of training samples is too high, the use of standard ELM is much more advantageous in terms of time.Hyperparametric optimization method is also used for classification with k-NN.These parameters can be thought of as the number of neighbors and distance type for k-NN.The resulting Hyperparameter optimization in MATLAB environment is shown in Fig. 3. Optimum parameter values are determined according to the graph.Euclidean distance was used as a distance type.The number of neighbors was chosen as two.Average accuracy rate was obtained as 77.5%.Using the best parameters, the training data was classified at 0.15781 sec.

Conclusion and Discussion
In this study, Breast Cancer Coimbra dataset [22] taken from UCI [21] was used.This dataset is different from other datasets in terms of feature type.This dataset includes age, BMI, glucose, insulin, HOMA, leptin, adiponectin, resistin and MCP1 features that can be collected in routine blood analysis.The significance of these data in breast cancer detection was investigated by ML methods.When the values in Table 3 are examined, the highest accuracy rate and the lowest training period are provided by standard ELM.
According to these results, the use of standard ELM is more advantageous in terms of time when there is a high number of samples.The importance of this work is pretty high because of the usage of the different type of data.In addition, this study is also important because four different ML methods are compared.As a result of the study, the obtained accuracy rate cannot be regarded as very high.However, this study investigated the utility of such data with ML methods in breast cancer detection.In addition, this study may support the further work in this field.

Fig. 3 .
Fig. 3. Hyperparameter Optimization for k-NN algorithm Hyperparameter optimization is also used for classification with SVM.Hyperparameters of SVM can be thought as regularization constant (box Constraint (C)) and kernel scale for SVM.The soft margin method has been taken into account in the classification by SVM.The resulting Hyperparameter optimization in MATLAB environment is shown in Fig. 4. Optimum parameter values are determined according to this graph.Optimal kernel scale value found 0.0287.Optimum C value was obtained as 0.4869.As a

Fig. 4 .
Fig. 4. Hyperparameter Optimization for SVM algorithm Analysis was performed with four different ML methods.Interfaces for ANN and ELM have been developed.In addition, the hyperparameter values giving the least errors for ANN, ELM, k-NN and SVM methods are determined using Hyperparameter optimization technique.Accuracy rates and training times were obtained according to these values.Calculated accuracy values and training time are shown in Table3.The k-NN method does not actually contain the training phase.The value in Table3represents the calculation period of the training data.

Table 1 .
Some data used for breast cancer detection

Table 2 .
Comparison of ANN and ELM results for 10 data IJISAE, 2018, 6(4), 289-293 | 292 result, average accuracy rate was obtained as 73.5%.Using the best parameters, the training data was classified at 0.1866 sec.

Table 3 .
Comparison of ML algorithms