A Note on Background Subtraction by Utilizing a New Tensor Approach

: This study deals with determining the foreground region by background subtraction based on a new tensor decomposition method. With this aim, the concept of Common Matrix Approach (CMA) is utilized with a purpose of background modelling. The performance of proposed method is validated by making experiments on real videos provided by Wallflower dataset. The obtained results are compared with well-known methods based on subjective on objective evaluation measures. The obtained good results indicate that using the CMA algorithm for background modelling is a simple and effective technique in terms computational cost and implementation. As an eventual result, we have observed that the superior results are determined on complex backgrounds including dynamic objects and illumination variation in image sets.


Introduction
Foreground detection is the principal interest topic of computer vision based applications such as intelligent visual surveillance, intelligent visual observation of animals and insects, optical motion capture, human-machine interaction, content based video coding, etc. The most extensively utilized areas can be given as road surveillance, airplane surveillance, maritime surveillance, boats and store surveillance systems, in where "people" is the main point of interest [1].
Major challenges associated with background subtraction can be noted as shadow, waving trees, foundations, intensity changes and camera jitter, which are called as dynamic backgrounds. Although a perfect solution has not been proposed to cope with these problems, but an affirmed method should be capable to alleviate all dynamic problems. The general idea is actuating a mathematical model to represent all image sequences of the processed background scene with a rich information one. Once the background model obtained, the difference between the test frame and model is considered as foreground in terms of traditional background modelling.
Numerous algorithms are proposed for background subtraction with a statistical or mathematical theory. By taking the handling strategy of images, the categorization of them can be grouped in two ways as 2-D based methods or tensor based methods. Technically, in the concept of 2-D based methods, each MxN frame is converted into vector format and a 2-D matrix is constructed with (M.N)xK dimension as K denotes number of frames in training set. Conversely, in tensor based one (3-D), a set of 2-D frames are combined and background is modelled through the tensor without converting frame into vector format. The 2-D based methods have disadvantages when compared with the tensor one. Specifically, in vector based methods the spatial information behind the neighbourhood pixels are neglected as all columns in a frame are connected as back to back in case of converting frame into vector format.
Various tensor decomposition based methods have been illustrated in research area of background subtraction. The Diffusion Bases (DB) [2] methodology has been adopted by decomposing 3-D data into 2-D plane, which denotes the found out background model. The capability of incremental tensor based background modelling [3] has been investigated with application for foreground segmentation and tracking. Another alternative method versus Principal Component Analysis (PCA) has been utilized by applying the concept of Locality Preserving Projections (LPP) [4], which is called as LoPP. An optimal rank-(R1, R2, …, Rn) tensor decomposition [5] model has been proposed in order to the high-dimensional tensor to low dimensional as sparse irregular patterns. Also, the Tensor Singular Value Decomposition on Fourier Domain has been analyzed for multilinear data completion and denoising, which is named as t-SVD [6].
Because of different challenges in the concept of background dataset, proposed methods do not meet all expectations. With this aim, a new tensor based background learning and change detection algorithm is presented in order to successful discrimination of foreground and background. Specifically, the theory of Common Matrix Approach (CMA) is applied to decompose 3D dimensional data (tensor) [7]. In case of orthogonal decomposition, the motivation of Gram-Schmidt orthogonalization is adopted. After projection stage, a common matrix that refers to obtained background model is determined. To report the statistical and visual results, the test stage is conducted on Wallflower dataset [8,9] The remain part of paper is designed as follows. In section 2, the CMA and its application to foreground extraction is presented. In section 3, the obtained objective and subjective results are compared with other tensor based approaches. Finally, a conclusion is touched.

Principle of CMA and Its Application to Background Subtraction
The CMA algorithm is an extended form of Common Vector Approach, which is a subspace based method and utilized for face recognition [10], spam classification [11], image denoising [12] and edge detection [13] tasks. However, the ability of CMA for background modelling has not been realized in literature of computer vision. In case of CVA the data is handled in vector format as a 2-D matrix is constituted from training set and matrix decomposition strategy is applied on constructed 2D data, whereas for CMA, a tensor is generated from 2-D frames.
The main idea behind the CMA is combining background information from different frames and obtaining a single frame, which summarizes cues about background locations. Assuming that we have given n sample frames 12 ( , , , ) n S S S and each frame in 2-D form. In the context of CMA, a frame can be represented with common and difference frames as shown in Eq. (1).
(1) Where the com S and , k diff S refers to common and difference frames, respectively. In order to calculate the Common frame, a tensor with 3-D size is constructed and the concept of Gram Schmidt is applied to derive orthogonal and orthonormal basis. First of all, difference matrices are calculated by a taking a first frame as reference. Instead of first frame, a different frame can be chosen among others as reference.
Where,  When the rank of data becomes smaller than 2 in case of highly correlated data, then the CMA procedure concluded with a not meaningful common matrix that is undistinguishable with human eye. To overcome this problem, a low noise value between 0-1 is injected to each difference subspace in Eq. 2 in terms of reducing the correlation ratio among the processed images.  Fig. 1, we can observe that the decomposed tensor generates two components: (1) first component reserves the common matrix of training set, which denotes the acquired background model. (2) the other component involves the difference matrix that refers to detail features of training set.
By using the CMA, we can see that foreground and changes are observed in difference matrix. Therefore, the strategy behind CMA provides a new way to detect moving and stable objects in a given dataset.
In order to reveal the foreground objects, the common matrix of test frame (F) is determined from the projection of incoming test frame onto the orthonormal basis returned by Gram-Schmit Again, the common matrix corresponding to the test frame is computed by subtracting test frame from the difference matrix.
In case of revealing the foreground objects the difference between the common matrix of processed video and common matrix of processed frame is taken into account.
As shown in equation above, the difference of two common matrix presents foreground objects. In case of Moved Object and Camouflage videos, the difference of two common matrices are considered to find the foreground regions for other ones the absolute difference taken into account. To obtain the pleasing visual results, some fixed morphological operations are applied on the foreground mask. Firstly, 5x5 median filter are utilized on the binary image. The connected components with the size of less than 20, are considered as ghost are removed by area open morphological operator. Then, the morphological closing procedure is utilized with disk structural element having size of 5 and binary holes are filled with morphological filling operator. Finally, morphological opening with disk structural element having size of 5 is performed to mitigate the effect of closing operator.

MOG
Stauffer et al.

SL-PCA
Oliver et al.

SL-IRT
Li et al.

Dataset
The experimental stages are conducted on well-known Wallflower Dataset. Numerous methods have been experimented on this dataset in order to objective and subjective performance comparison. Wallflower dataset [9]

Subjective Results
In the present work, a simple thresholding methodology is realized in case of revealing the binary skeleton of objects. Since the difference of two common matrix gives changes, a fixed thresholding is carried over the absolute difference. The obtained visual results are demonstrated on Table 1.
To subjectively judge performance of both methods, the obtained visual results are compared with state of the art subspace and other methods, which are given as Single Gaussian (SG) [15], Mixture of Gaussian (MOG) [16], Kernel Density Estimation (KDE) [17], Subspace Learning PCA (SL-PCA) [18], Subspace Learning ICA (SL-ICA) [19], Subspace Learning via Incremental Non Negative Matrix Factorization (SL-INMF) [20] and Subspace Learning via Incremental Rank-(R1, R2, R3) Tensor (SL-IRT) [21]. For this purpose, the visual results determined in the work of Bouwman [22] are taken as ground on in case of performance comparison.
In Table 1, the first column denotes method's name, the other columns show video's name, respectively. Again, the first row and second row exhibit test image and related ground truth, and other rows demonstrates visual results returned from each method. From the exhibited results, we can observe that each method presents similar foreground objects in the meaning of obtained foreground skeleton.
By analysing results, one can note that results of MOG and KDE are closest to each other and more dominant than the SG method. The performance of SG, MOG and KDE are weakness to illumination changes due to stochastic characteristic of them as working based on the historical probability of pixels. To continue, we can see that subspace based method are more robust to light changes.
By comparing the PCA, ICA, INMF and IRT, we can emphasize that the result of IRT is the worst one in terms of preserving foreground skeleton. While the INMF shows good results in case of bootstrap video, but the same performance has not sustained in case of camouflage video.
Moreover, the results of PCA are similar to CMA method, however, the PCA method fails in case of indoor crowded scene (bootstrap). Furthermore, the proposed method not only robust to dynamic structures but also resistance to illumination change in case of foreground detection.

Conclusion
In this study, the impact of CMA is investigated for background modelling based foreground detection. The performance of the proposed method is compared with other well-known methods for dynamic backgrounds including Moved Objects, Time of Day, Light Switch Waving Trees, Camouflage, Bootstrap, Foreground Aperture. From the objective and subjective evaluation, it has observed that the proposed method exhibit eye pleasing results. The obtained experimental results present significant performance difference between PCA, ICA, INMF and probabilistic based methods (SG, MOG and KDE) in terms of accuracy and robustness to dynamic changes among the images for a given video. From the overall evaluation, one can emphasize that a smart post processing procedure is greatly needed to both accurately reveal the region of foreground object meanwhile eliminating the noisy pixels caused by uncontrolled changes, which are waving trees and illumination changes. As a future work, a comprehensive and universal background subtraction method is aimed to develop by using the concept of CMA.