INTELLIGENT SYSTEMS AND APPLICATIONS IN ENGINEERING Comparison of Spectral and Template Matching Features for SSVEP BCI Target Frequency Classification

: Brain-computer interfaces (BCI) provide new communication and control channels to restore and support these functions of the restricted users. Among these Visual Evoked Potential (VEP) based BCIs are the most promising in terms of ease of use and performance. The frequency following phenomenon of VEPs produce Steady State Visual Evoked Potentials (SSVEP) at the frequency of stimulation of the human visual system. In such interface systems, each target is encoded with a particular stimulation frequency and phase. In communication purpose speller interfaces each target flickers a letter or character with a particular stimulation frequency and phase. The detection of the focused target by the computer is required. In this process, classification methods and feature extraction method play critical roles. This study used a publicly available benchmark dataset of a 40 target SSVEP BCI. In the analysis, two feature vectors are obtained from power spectrum parameters and one from stimulus template matching correlation coefficients. The performance of the three classification methods, namely Fine Tree, Linear Discriminant Analysis (LDA) and K-Nearest Neighbors (KNN), are compared using these feature vectors. Spectral features performed better than the template matching features. Especially the feature vector of the target frequency signal ratio (TFSR) to the total stimulation band energy features provided better accuracy values. LDA and KNN performed better than decision tree in classification. (https://creativecommons.org/licenses/by-sa/4.0/)


Introduction
Visual Evoked Potentials (VEPs) are electrophysiological signals from the electrical activity of the visual cortex and recorded with electrodes placed over the scalp. There exist three VEP types depending on the stimulus type, namely Flash VEP, Patternonset/offset VEP, and Pattern-reversal VEP [1]. In Flash VEP paradigm, a brief luminance increment or flash is presented to the subject, and it produces a characteristic response with the most stable peaks occurring at N2 (90ms) and P2 (120ms) [2]. Normally this response is difficult to observe due to background EEG activity. It requires many repetitions of the experiment and synchronous averaging to enhance the characteristic VEP signal. If the stimulus repetition interval is long enough, greater than 250ms (stimulation rate < 4Hz), such that the visual system can return to a stationary initial state before the beginning of each stimulus, a brief input of luminance change creates a characteristic response known as Transient VEP [3][4] [5]. Transient VEP has a number of characteristic peaks time-locked to the stimulus. For this reason, Transient VEP has been widely studied in Event Related Potential (ERP) studies and monitoring diseases of visual pathway. On the other hand, rapid visual stimulation with fixed stimulus repetition interval less than 250ms (stimulation rate >4Hz) generates a Steady-State Visual Evoked Potential (SSVEP) that contains a constant fundamental frequency due to periodic overlapping of each evoked potential peaks [6][7] [8]. SSVEP spectra contain a major frequency the same as the stimulation as well as its harmonics and subharmonics. Brain-Computer Interface (BCI) is a system developed to provide its users ability to interact with the environment by translating the specific brain signals into desired words or actions to improve the quality of life. A BCI records electrical brain activity and links it to the external environment actions or internal body parts in order to improve the natural brain outputs [9]. BCIs have gained tremendous popularity in the last decade. Especially SSVEP BCIs gained much interest during the last decade due to some particular properties. The advantages of SSVEP BCIs include easy system configuration without a need for synchronization [10]. Neither a training nor an initial recording to generate a VEP template is required. It provides a high Information Transfer Rate (ITR) (usually >50bpm). Although the LCD screen refresh rate limits the number of frequencies available for targets, phase encoding provides more targets assigned with one frequency [11]. Another advantage is that it is implementable by common LCD screens and stimuli can vary from color, pattern to faces possible. In a multiple target SSVEP BCI, target detection is challenging due to mixing of frequencies. There exist time and spectral features for EEG classification, there lacks an ideal feature set for multi-target SSVEP classification application. Since the optimal selection of features give rise to the best performance, the potential feature extraction methods for SSVEP need to be studied in detail. This research aims to investigate the effects of three feature extraction mechanisms and three classification methods on the accuracy of the classification. These feature extraction methods employ different domain features, either  In general, power spectral density (PSDA) or Signal to Noise Ratio (SNR) analysis of the target frequency is an easy and efficient solution. However, it can be further improved by incorporating the information in the other target frequencies in a feature vector as well. This study aims to illustrate the potential of this feature vector in SSVEP BCI classification.

Setup and Stimulation
A general SSVEP BCI recording system is shown in Figure 1. A stimulus matrix of 5 rows and 8 columns was created to present 40 characters to the subject using a 23.6 inch Acer GD245HQ LCD monitor [12]. The response time of the monitor was 2ms. Pixel resolution and refresh rate were 1920 x 1080 pixels and 60Hz. From 70cm away viewing distance, total speller area covered 34° x 24° for horizontal and vertical axes respectively. Each target stimulus covered 3.2° x 3.2°of visual field. The separation between stimulus targets were 1.14° for both horizontal and vertical axes. Joint frequency and phase modulation (JFPM) method was used to present the 40 visual target stimuli [13]. A sampled sinusoidal stimulation method was applied, the details can be found in [12][14] [15]. Target frequencies to elicit SSVEP responses ranged from 8Hz to 15.8Hz with steps of 0.2Hz and target phases changed by multiples of π/2 as shown in Figure 2.

Dataset
The dataset is obtained from the University of Tsinghua Biomedical Engineering BCI Group webpage (http://bci.med.tsinghua.edu.cn/download.html). It contains data of 35 subjects tested on a BCI setup with a wide range of (40) target frequencies encoded with JFPM. For each subject, experimental paradigm consisted of gazing tasks at each targets.
One trial was recorded for each target. Each gazing and VEP recording task was repeated 6 times in one trial. Total number of epochs is 40 trials x 6 epochs/trial = 240 epochs. In each epoch, 5s simultaneous stimulation of all the targets on the screen was preceded by a 0.5s target cue. At the end of each epoch, there was a 0.5s rest interval. Total epoch duration was 6s. Each epoch was resulted from a 5s visual flicker stimulation of 60Hz LCD screen. Data was collected from 64 electrode channel EEG recording setup using international 10/20 EEG electrode system. Data was referenced to Cz. The sampling rate was 1kHz and a notch filter was applied to remove 50Hz power-line noise. To reduce data storage size, data was down-sampled to fs=250Hz. Since the dataset includes both experienced and inexperienced subjects, the equal number of subjects (7subject) was selected for analysis for each group. The data for subject 5 was not available in the dataset and excluded from the analysis. The details of the subjects are shown in the Table-1.

Data Pre-processing
In this study a laplacian spatial filtering of four channels is utilized such that the data is obtained from subtracting the average of O1 (61), O2 (63), and Pz (48) from the Oz (62) electrode recordings. Laplacian spatial filtering improves the BCI performance [16]. This spatial filtering reduced the common background EEG noise and enhanced the SSVEP response at the Oz channel. In addition to the spatial filtering a bandpass filter was applied. The filter was a 3 rd order Butterworth band-pass filter with pass-band frequencies from 4Hz to 32Hz. The filtfilt() function in Matlab was used to prevent the phase delays in the signal due to filtering

Feature Extraction
Feature vectors based on three methods are compared in this study, namely time domain SSVEP template-matching crosscorrelation coefficients (TMCC), PSDA (SNR) based feature vector, and target frequency power ratio (TFSR) in stimulation frequency band feature vector. Although there exists other time and spectral domain features in literature, their use in SSVEP detection have been found very limited in a preliminary analysis. Therefore, these basic features are excluded from further analysis. Each feature extraction method is described below.

Target Frequency SNRs as Feature Vector
PSDA based methods are the common target detection methods used in SSVEP BCIs [17][18] [19]. In PSDA based target frequency detection, SNR in terms of energies of the each target frequency is computed using the following formula (1): where S is defined as the ratio of energy in kth frequency to the overall energy in the n neighbouring frequency bins. P(fk) is the amplitude FFT value at the kth frequency bin. fk is the stimulus frequency, fres is the frequency resolution of the FFT. P=FFT(x) where x( ) is the 1250-points temporal EEG data. Since the sampling rate is 250 Hz, fres is 250/1250=0.2Hz. The number of neighboring frequencies is set to n=5 to include the -1Hz and 1Hz spectrum in the PSDA energy ratio S(fk) computation. Finally an output vector of 40 S(fk) values for each epoch is constructed as feature vector.

Target Frequency Signal Ratio (TFSR) in Stimulation Frequency Band Feature Vector
Target Frequency Signal Ratio (TFSR) is basically the ratio of the target frequency energy to the total stimulation frequency band energy. It incorporates the each target frequency energies which is not included in the SNR feature vector. The TFSR can be computed using (2) as follows: (2) where N is the number of total stimulus frequencies (N=40), k is the target frequency index for each target (k=1:40). A feature vector of all 40 TFSR is computed for each epoch for training and test data.

Template Matching Correlation Coefficient (TMCC) Feature Vector
Another method for target frequency detection is template matching based method. An SSVEP template is created for each target SSVEP using the stimulus properties. An example of an SSVEP recorded signal and template is shown in Figure 3. The SSVEP response is expected to be with the same stimulation frequency with a delay about 136ms since the flash VEP response first peak latency is about 136ms in [12]. Therefore, an SSVEP template Tk(n) for each target frequency is created using (3). (3) Here fk is the target stimulus frequency given in Figure 2 and θk is the corresponding phase. The average latency delay (Δ) of the flash VEP is subtracted from the phase. TMCC feature vector is computed using correlation coefficient equation [19] as in (4). (4)

Classification
In the classification stage, three classifiers are used namely; decision tree (fine tree), Discriminant Analysis (LDA) and KNN.
To avoid possible errors due to the non-homogenous distributions in the data, a 5-fold cross validation is applied to the classification learning stage. The data is divided into 5 random partitions, among these 4 partitions are used in the training and 1 partition is used in the test process, this is repeated 5 times. The performance outputs at the end of classification with each data division process are gathered. The arithmetic mean of the performance outputs from each classification process is obtained as the final metric, particularly training accuracy. Two third of the data is used in training. For evaluation of the performance with unseen data, remaining one third of the data is used as test data. Performance assessment is done using accuracy metric which can be computed using (5).
where TP, TN, FP and FN represent true positive, true negative, false positive and false negative classification counts respectively.

Decision Tree
Decision tree based classification simplifies a complex classification process into a collection of simpler decisions, [20]. Decision tree classifiers try to have the simplest structure, classify the most of the training data accurately as possible, achieve a generalization so as to perform good with unseen data, be updated easily with more training [20]. Fine tree method is selected for the advantages of fast implementation, high flexibility, and small memory usage. In fine tree many leafs are used in order to have a fine separation between classes .

LDA
LDA classifies the training feature vectors of different classes using hyperplanes [21] [22]. For a two-class classification, the classes are defined using a linear discrimination function representing a hyperplane boundary in the feature space. The class to which the feature vector belongs depends on the side of the plane where the vector is found. To classify more than two classes, e.g. N-class problem (N > 2), several hyperplanes are used. For multiclass BCI classification one vs all separation is repeated for each class to find the correct class among all. Using the training data, the location and orientation of the hyperplane are set [24]. One decision plane can be defined as g(x): where w is the weight vector, x is the input feature vector and w0 is a threshold. The sign of g(x) determines the input feature vector class [24]. This indicates which side or class the input belongs to.
LDA classifier is a simple low complexity classifier but efficient. It does not need high computation power yet achieves good accuracy for BCI classification purposes especially for online BCI systems [25] [26]. Simplicity of this classifier makes it good at generalizing to unseen test data and yields good accuracy values in practice [26].

KNN
In K-Nearest Neighbours as the name suggests the unseen test sample class is set to the dominant class among its k nearest neighbours within the training set [22]. It relies on the principle that features related to each different class forms a separate cluster in feature space [23]. The power of KNN is that if a high k and enough training samples are used it can approximate any function, even nonlinear decision boundaries [26].

Results and Discussion
The accuracy for each subject, feature and classification method combination was calculated. Figure-4    are 54.3%, 37.4%, and 63.6%. Since there is a huge difference between inexperienced and experienced subject performances, I carried out further comparison analyses for each separate groups of experience independently. TFSR combined with KNN or LDA achieved the highest accuracy compared to other features and classifiers (p<0.05), see Figure-5. TFSR significantly improved accuracies compared to SNR and TMCC by 6% and 33% respectively for experienced subjects. On the other hand, for inexperienced subjects 17% and 70% improvement is observed when TFSR is used instead of SNR or TMCC respectively. The correlation coefficient between averages performances of each set of features are computed. There is strong correlation between each feature performances (0.98 for TFSR-SNR, 0.93 for TFSR-TMCC, 0.93 for SNR-TMCC), however the TFSR has the highest performance in classification and can be used in SSVEP BCI classification tasks.
In conclusion, the inclusion of each target frequency signal ratio to the total stimulation frequency band energy is a promising feature extraction method to achieve higher accuracy. It is easy and efficient. It can replace the traditional PSDA methods. In addition, LDA and KNN classifiers provide acceptable accuracies, and high and medium prediction speeds respectively, see Figure-5. Moreover, these performances are obtained using only 4 electrodes which is feasible for a typical BCI realization for daily-life usage. For future studies, the performance of the proposed TFSR feature with other datasets and classification methods can be analyzed.