INTELLIGENT SYSTEMS AND APPLICATIONS IN ENGINEERING Selected Three Frame Difference Method for Moving Object Detection

: Three frame difference is one of the well-known methods that is used to perform moving object detection. According to the theory, the presence of a moving object is estimated by subtracting consecutive three image frames that provide moving object edges. However, these edges do not give complete information of the moving object which means that the method leads to loss of information. Some post-processing methods such as morphological operations, optical flow and combining these techniques are necessary to be apply for obtaining complete information of moving object. In this paper, we present a new approach called Selected Three Frame Difference (STFD) to detect moving object in video sequences without any post processing operations. We initially propose an algorithm that selects three images considering the local maximum value of frame differences. Instead of using consecutive three frames, these three selected image differences that include non-overlapping object frames are applied to the logical and operator. We mathematically prove that the entire moving object is always detected in the second selected image. We analyzed the proposed method on public benchmark dataset and the dataset collected from our laboratory. To validate the performance of our approach, we also compared with three frame difference method and background subtraction based traditional moving object detection methods on a few sample videos selected from different datasets.


Introduction
Recently moving object detection in video sequences is required for making analysis more intelligent aspect in computer vision area such as surveillance in daily and wild life, retrieval and recognition [1][2][3][4][5][6][7][8]. The key issue of moving object in image frame can be defined as a region collected by a set of connected pixel that are segmented from the background. In general, the spatio-temporal changes between the consecutive images caused by moving object are considered to distinguish object in the scene from background [9][10][11][12]. However, there are so many difficulties obtaining robust model to detect object moving in the video sequences obtained from either indoor or outdoor scene. Lighting conditions, sudden fluctuations in direction of the object motion, complex background, shadow and non-rigid object deformation are some important problems that have been studied by researchers during the last decades in different aspect of view reported in [13]. Background subtraction, optical flow and frame difference are well-known algorithms for obtaining location of moving object in the video frames. Although mathematical approximation is very simple for all methods, in real applications, each method has advantages and disadvantages between each other. Background subtraction is basically a technique that detects object by subtracting background models which are estimated from consecutive image sequences by taking average of intensity [14][15][16][17]. This method performs well accuracy when the camera is positioned as static and only small movement in the background can be available. Optical flow requires significant displacement image points which can be defined as location of moving pixels over the image frames [18,19]. Optical flow is represented by twodimensional vector which are velocities and direction of image points. Since it is sensitive to illumination changes and noises, additional processing is required to obtain distinguishing results that leads to complex computationally. The frame difference method, commonly used expression as temporal difference method, basically estimates presence of the moving object between two consecutive image [4,20]. Frame difference method [20,21], three frame difference method (TFD) [11,22,24] and four frame difference method [27] are such a different frame difference algorithm that have been proposed in the literature during last decades. The output of frame difference operations is the union of moving objects of two consecutive images [20]. The TFD methods are based on using logical "AND" (&) operator to the outputs of two frame difference obtained from the three consecutive images. The four frame difference method is based on the use of four consecutive images with the threshold parameter in order to avoid shadows in the TFD. Although effective results can be obtained by frame difference methods, there are some disadvantages such as leaving holes in foreground objects and detecting ghosts from background regions. In classical approach of three frame difference method [11], moving objects' edges are approximately detected through consecutive of three images. However, it is not sufficient information to represent all boundary pixels of moving object. Edge combining algorithms [23], morphological analysis method [28], optical flow [19] are therefore used to improve the performance of three difference method for obtaining every pixel of moving object. Morphological analysis method such as erosion and expansion are used to clean noises and fill the edges. In addition, the tendency and trend directions are used to combine the detected edges. This method is inefficient where dynamic background is present [24,25]. Time-lapsed image selection method is applied to create the background model and identify the moving regions of the images [26]. Two-dimensional global thresholds have been applied on the interest areas to effectively clean the ambient noise. In a recent study [24], the background model is calculated by taking average intensity values of five consecutive images and then the outer line information of the moving object is obtained. However, it is not possible to detect more than one moving object. To overcome the limitations of conventional post processing algorithms and to obtain complete information of moving objects without applying any process after TFD method, we propose a new algorithm based on TFD method which is called selected three frame difference (STFD). The contribution of our proposed approach is provided by the selection of three images that is used for TFD methods. In our proposed method, images are selected by calculating local maximum differences between two images instead of selecting consecutive images. This contribution ensures to obtain the discrete and non-overlapping complete moving object regions of the output of the frame difference operation. We also mathematically prove that second selected image is always obtained after applying logical "AND" (&) operator to the frame difference methods. The paper is organized as follows: We describe the proposed STFD algorithms in more details and present and mathematically prove our assumption for selecting images in Section 2. We show the performance of proposed algorithm on benchmark dataset and our own created dataset in Section 3. We summarize our findings and give future direction in Section 4.

Three Frame Difference
In classic frame difference approaches [11,19], the moving regions is detected through subtracting consecutive images. It is assumed that image sequences are captured by a static camera. Let us consider that Ik(x,y) is the intensity of the kth frame. The difference image Id(x,y) used to detect moving region is calculated as follows where x and y are the coordinates. In order to remove noises, the median filter is applied to the difference image. Then, global threshold value Td is used to convert gray scale images to binary images defined as follows where If (x, y) is the binary image at the coordinate x and y. TFD method is originally developed by using frame difference and then applied logical operator from consecutive images. In this method, two frame difference outputs are obtained from three consecutive images. The moving regions are then detected by applying logical "AND" (&) operator on these output frames calculated as follow where Ib is the three frame difference binary output of images, If1 is difference between kth and (k+1) th images and If2 is difference between (k+1) th and (k+2) th images. With the expressions of the TFD method, edges of the moving object in the (k+1)th image are obtained by applying logical "AND" (&) operator to the two frame difference output Ib [19,, 24]. Fig. 1 shows overall structure of our proposed image selection method. Our main contribution is that instead of using consecutive images, the selected images that contain the non-overlapping objects' regions will be used in the three-frame difference method. The details of proposed methods are described as follows. We have image sequences denoted as x= x1, ..., xn where n is the number of image frames.

Proposed Image Selection Method
Step 1: kth is the first image in the consecutive images denoted as Id1 when moving object is detected in image sequences.
Step 2: Initially, the intensity difference between (k+i) th frame and selected first image frame kth are calculated respectively. As the object moves within the scene, total amount of intensity changes D increases described in Algorithm 1. This is due to the fact that overlapped region of moving objects decreases. Fig. 2 shows the conceptional diagram of intersection region belonging to moving object according to intensity changes and differences. In order to select second image, we propose an algorithm described in Algorithm 2 that determines maximum total amount of intensity changes denoted as LMj where j is the selected second frame. The second image denoted as Id2 is then selected corresponding to jth frame. The differences between Id1 and Id2 images represented as If1 is finally calculated.
Step-3: The third image Id3 is selected by computing next local maximum LMj+m where m is the selected third image corresponding to selected jth second image. In this case, the total amount of intensity changes is calculated according to the selected second image Id2. The differences between Id3 and Id2 images represented as If2 is finally calculated.

Assumption:
The complete moving object region is detected in the second selected frame by taking the logical "AND" operator between nonoverlapping moving objects in the frame of Of1 and Of2.

Proof:
Let us define as Od1, Od2 and Od3 are the non-overlapping moving objects of the first Id1, second Id2 and third Id3 selected images. The intersection regions between for each moving objects of the selected images are calculated as empty set given as below.
The moving object regions of the frame difference output If1 and If2 is calculated as Logical and operator is expressed as intersection of these regions defined as follows This mathematical expression is simplified as where we verify that the moving object region of the second image Od2 is obtained.

Experimental Results
We firstly tested our proposed image selection algorithm on three standard benchmark video datasets: CAVIAR 1 dataset, Fast Moving Object dataset (FMO) 2 and LASIESTA 3 dataset. We then analyzed the proposed STFD methods on real indoor movies dataset obtained from camera traps in our laboratory which is called CTRIN 4 Dataset. Table 1 shows detailed information of some videos of the datasets. In order to verify the computational simplicity and efficiency of our proposed method, we compare with existing difference based methods for all datasets. Our experiments are carried out with ANACONDA distribution and Python 3.7 version executed on an Intel(R) Xeon(R) X3440 cpu@2.53 GHz processor with 8GB RAM. As seen overall structure of proposed method results in Fig. 1, the first image kth denoted as Id1 is directly selected when object motion is detected in the image frames. The second image Id2 and the third image Id3 is selected by calculating the local maximum LM1 and LM2 values in the consecutive images described in Algorithm 1 and Algorithm 2. As seen in Fig. 1, the fourth image frame represented as (k+3)th in the consecutive images is the local maximum values. It is observed that the changes of difference are zero or negative in the next frame shown. The third image Id3 is selected by calculating the local maximum LM2 according to the second selected frame Id2. As seen in Fig. 1, the ninth image frame represented as (k+8)th in the consecutive images is the second local maximum value. The discrete and non-overlapping moving object regions of the output of the frame difference operation is obtained by using proposed image selection method. As seen in the Fig. 1, although the moving object regions is seen non overlapped in the frame difference outputs, (k+5)th or (k+6)th is not selected as the third image. This due to the fact that the image contains shadow of the moving object. At the end of the algorithm, the three image frame Id1, Id2 and Id3 are selected in order to use in TFD method instead of using consecutive images. As shown in Fig. 3, moving objects of second selected image are detected by taking logical "AND" (&) operator between the selected output frame difference described in Equation (7). As seen from the results of output TFD, it is clearly observed that the complete information the moving object is obtained. In the experiments, we require three parameters for suppression of noise and small objects; global thresholding, the median filter box size and a threshold parameter for the disregarding small objects. Global threshold parameter is set to 17, the image size is set to 1/50 and the median filter box size is set to 1/35. It is observed that the median filter box size yielded successful results in many data sets in the value of 1/50 ratio. The accuracy of the proposed method was calculated by using recall, precision and F-score value represented in Equation (8), (9) Table 2 represents precision, recall and F-score values of proposed the method applied on the benchmark CAVIAR, FMO and CTRIN datasets for testing the accuracy. As shown in the table, low precision value is obtained in the Caviar Walk2 video frames. The reason is that slowly moving object causes loss of information considering the different places that the moving target appeared in.
On the other hand, due to containing comparatively fast moving objects in the image frames of FMO dataset, the accuracy increases. For our own CTRIN video dataset, it is clearly indicated that the precision and recall values are significantly high. One possible reason for the relatively good performance on these values might be elimination of shadow effect in the indoor environment. However, for all datasets, in the case of multiple moving objects available in the image frames, we observed that the accuracy of the proposed STFD method decreases. Table 3 shows the confusion matrix results of the proposed STFD method applied on all datasets.  Fig. 4 shows as a sample of output frames for the proposed STFD and TFD methods on all dataset. The goal of this comparison is to emphasize the effectiveness of our method on different video frames containing the various type of actions and environments. It can be observed that the proposed STFD method can detect moving objects with higher accuracy than three frame difference TFD method, providing only edges of the presence of moving objects   even the videos under taken from indoor or outdoor environments. We also compared proposed STFD method with TFD and background subtraction based traditional moving object detection methods; (GMG [29], MOG [30], MOG2 [31], GRA [32], CNT [33], GSOC [34], KNN [35], LSBP [36]) in terms of F-score result on the LASIESTA3 dataset. Since the ground-truth of the LASIESTA3 dataset contains pixel-based segmentation results for each background and foreground class in all frames, the accuracy of the proposed approach can be analysed and compared more efficiently with other methods. Therefore, F-score values are calculated with each pixel classes evaluation in ground-truth instead of bounding box and iou based method. The detailed results are shown in Table  4. As seen from the table, the accuracy of the STFD is substantially higher than the TFD and background subtraction based traditional moving object detection methods. In addition, we compared time complexity performance evaluation of the proposed STFD method with TFD method and background subtraction based traditional object detection methods on CTRIN dataset as shown in Table 5. As seen in table proposed method has the best processing time performance among the all approaches.

Conclusion
We proposed a new algorithm STFD to detect moving objects without using any post processing methods. The STFD method is basically improves three frame difference by selecting nonoverlapping moving object in video frames. The proposed algorithm selects images calculating by the local maximum difference value between image sequences instead of using consecutive frames. We mathematically proved that the moving object is obtained the second selected image applying "AND" (&) operator of frame difference method. We tested our proposed STFD methods on benchmark datasets and the real data collected in our laboratory. We also compared the proposed method with TFD method and background subtraction based traditional moving object detection methods. We observed a remarkable result that the complete information of moving object is more accurately obtained compared with TFD method. We observed from the experimental results that regardless of indoor or outdoor environment, the accuracy of proposed algorithm is quite high where only one moving detected object is available. In the case of multiple detected moving object in the scene, the accuracy decreases and noisy results are obtained. Despite these negative result, the accuracy of STFD is still significantly higher than TFD method.