FAST MOVING OBJECTS DETECTION USING iLBP BACKGROUND MODEL

In this paper a new approach for moving objects detection in video surveillance systems is proposed. It is based on iLBP (intensity local binary patterns) descriptor that combines the classic LBP (local binary patterns) and the multiple regressive pseudospectra model. The iLBP descriptor itself is considered together with computational algorithm that is based on the sign image representation. We show that motion analysis methods based on iLBP allow uniformly detecting objects that move with different speed or even stop for a short while along with unattended objects. We also show that proposed model is comparable to the most popular modern background models, but is significantly faster.


INTRODUCTION
One of the key problems in the intellectual video surveillance systems is fast detection of moving objects.It is usually solved by building a background model of the scene and getting a difference between the built model and the current frame or a group of last frames.The most common background models are GMM (Gaussian Mixture Model) (Stauffer, 1999) and KDE (Kernel Density Estimator) (Elgammal, 2000).A fast model based on multiple regressive pseudospectra was also proposed (Vishnyakov, 2012).But all these methods, having their own advantages, also have a serious common drawback: they accumulate the information about luminance distribution over a finite period of time in every pixel independently and the relations between neighboring pixels are not taken into account.In recent time, there appeared some papers where authors considered every pixel with a descriptor of its surroundings (Heikkila, 2004;Heikkila, 2006).For example, the textured-based method (Heikkila, 2004) modelled the background with a group of histograms based on local binary patterns.Using LBP histograms helps to avoid labelling some moving background pixels as foreground since it extracts region texture features.However, its detection performance will sharply decline when scenes have strong changes.As further improvement of Heikkila approach, dynamic background modelling and subtraction based on spatio-temporal local binary patterns were introduced in (Zhang, 2008) and modeling pixel process with scale invariant local patterns were introduced in (Liao, 2010) to handle different illumination variations.Such approaches substantially increase quality characteristics of both background models and video analysis algorithms.However, it appears to be almost inacceptable due to its computational complexity, when a real-time processing is needed for multicamera (not a single camera) setup.In this work, we propose a new iLBP (intensity local binary patterns) descriptor and build a fast background model on its basis.Together with regressive estimate of the value of an individual pixel, we use a statistical estimate of the LBP descriptor components.This approach allows stabilizing the value of the descriptor and constructing a background model that is robust to lighting conditions changes in the scene and is applicable for the real time multi-camera setup.

iLBP descriptor
However, LBP descriptors with all its advantages have some significant drawbacks.The main drawback is a complete ignoring of intensity information when comparing LBP descriptors.Because of this, there could be a paradoxical situation (wrong pixel comparison result) when intensity values of pixels differ drastically, but their LBP descriptors are identical.
On the other hand, it is obvious that within a chosen scene the fact of a local intensity change in the point of interest is very important.To overcome this drawback, we define (, ) descriptor as a collection of (, ) descriptor values and intensity (, ) values of the image: The definition of iLBP descriptor lead us to the following formula for the distance  iLBP between the iLBP descriptors in the image points  and : where  is a proportionality factor, (⋅) -Hamming distance between two LBP descriptors.
The proportionality factor  can be chosen in the range of 2 -8 to 2 -4 to match the possible values of the Hamming distance.

iLBP background model
We propose an approach for motion detection using a dynamically changing background model.In contrast to classic methods we consider an image not as a function of intensity, but as a set of iLBP descriptors computed for every point of the image.
Every descriptor contains intensity value and a binary vector, and this dual nature is a problem.The background model can also be considered as containing two independent models, first one corresponding to the binary part (iLBP) and the other to the image intensity part.These models are united at the last stage of processing for motion detection and segmentation of moving regions.
It's convenient to use the regressive model for the "image" part, which has already proved itself very fast and reliable in solving motion detection problems (Vishnyakov, 2012).For the "binary" part we propose using a simple statistical model.Let us consider  consequent frames of the video.Let  be the current frame.
For the image part let us consider accumulator of the regression pseudospectra (Vishnyakov, 2012): where (, , ) is a (, ) pixel intensity at the frame ,   (, , ) -the value of the -frame accumulator in a pixel (, ), -frame means all frames from (T -N) frame up to T frame,  ∈ [0,1] -regression parameter.
For the binary part of the scene, point (x, y) is corresponded to a sequence of LBP descriptors on the considered frames: LBP N (x, y, T) = {LBP(x, y, T), LBP(x, y, T − 1), … , LBP(x, y, T − N)} Without intensity components of the descriptors this sequence can be considered as an implementation of stationary random processes n 1 (T), … , n 8 (T), n i (T) = LBP i (x, y, T).Here LBP i (x, y, T) is the bit  of the LBP descriptor computed for the point (x, y) on the frame T. Every random process corresponds to a pixel in the central (x, y) pixel's neighbourhood and to a bit in the LBP code (Fig. 2).
where  ⋅  -quantile level,  = 1, … , 8 -bit number.Thus, the final binary part of the descriptor is put together like (2): For the moving objects detection we propose to use a similar to (6) sum of the differences of independent background models, calculated for various parameters: For the regular use,  2 parameter can be set to 1 frame.However, in general,  2 can be any number less than  1 .We recommend setting parameter  2 equal to  1 /2 or  1 /4 to improve noise filtration.
Main  1 parameter is a primary accumulator length and can be considered exactly like an accumulator in (Vishnyakov, 2012).If we set  1 equal to a relatively small number of frames (8…32), moving objects will be detected.If we set  1 equal to a relatively big number of frames (512…2048), unattended and carried away objects will be detected.Therefore, this approach allows uniformly detecting objects that move with different speed or even stop along with unattended or carried away objects.
To identify if a (, ) pixel belongs to the background or to the foreground we use a simple threshold: This method of moving objects detection allows combining the strengths of both textural techniques of image comparison and intensities.

iLBP computation remarks
In the practical implementation of the approach described above, a number of significant challenges appear.The first obvious problem is to store a sequence of LBP.The easiest way is to use 8 integer variables ( 1 , … ,  8 ), where each   corresponds to its bit descriptor: (8) Then the problem of computing quantiles reduces to the well known population counting problem.This task can be quickly calculated by modern processor commands, for example, popcnt().
Another difficulty is the necessity of calculation of the LBP descriptors for each frame, wherein the most processor time in the calculation of the descriptor is used by paired comparisons.
If reconfigurable or customary processors are available this is not a problem (Boutellier et al., 2012), but for an ordinary PC hardware the nontrivial solution to this problem is a sign representation of the image.With this approach we consider the image as a plurality of paired comparisons (more is 1 and less is 0).In addition, symmetric pairwise comparisons are considered equivalent.Such a representation can be shown as a flat undirected graph (Fig. 3).

Figure 3. Presentation of image as an undirected graph.
In this graph, each node corresponds to a pixel of the image, and the weights of the edges correspond to the pairwise comparisons.
Then the Hamming distance for the two binary parts of the iLBP descriptors is the number of mismatches of the weights  of the respective edges of the sign representation (accounting that the graph is undirected, you can use a simple rule (, ) = 1 − (, ), where  and  are related nodes).In considered case, we do not need to take into account the orientation of the edges, because descriptors are compared in the same points on different frames.
Background model, described in the previous section, can be easily transformed for the sign representation of the image: Thus, all the operations described in the previous section are easily transferred to the case of the sign representation.However, for computing, storing and comparing of sign representations we require half as much processor and memory operations.In addition, this representation shows the relationship of this approach to the morphological image analysis, described in (Karkishchenko, 2010).This fact explains the high stability of the algorithm to changes in brightness and external noises.

TESTING RESULTS
In order to demonstrate the efficiency and effectiveness of the proposed approach we compared our experimental results with the well-known background modeling methods GMM (Stauffer, 1999), KDE (Elgammal, 2000), STLBP (Spatiotemporal local binary patterns) (Zhang, 2008) on the two video sequences ("highway" and "PETS 2006").Particularity of the "highway" video is the large number of branches and moving shadows cast by them.Significant challenge for the analysis of the video "PETS 2006" is the large number of moving objects in the background, partially fenced and having low constrast.Quantitative results of the methods of GMM, KDE and iLBP model are shown in Table 1, Table 2, image comparison -in Figure 4.Under false negatives we mean the number of background pixels that were not found.Under the false positives -the number of background pixels that have been considered as a moving object by the algorithm.Ground truth images, processed images, frames per second, true positive, false positive, false negative results for GMM, KDE, STLBP were taken from the video database CDNET (Goyette, 2012) for change detection.Note that all processed images from CDNET were median filtered.iLBP result images were not filtered to show what it is capable off.Achieved iLBP results allow suggesting that the proposed method is not generally inferior to GMM, KDE and STLBP, but greatly exceeds them in computation and gives reasonable moving objects masks in outdoor scenarios.

CONCLUSION
The problem of automatic video analysis for the detection and tracking of moving objects is the most significant problem in the field of motion analysis, applied to problems of video surveillance and security systems.In this paper, we proposed a new approach to the problem of background modeling and subtraction based on combination of image intensity and binary information in each pixel.For this purpose, we introduced a new descriptor iLBP and a fast method for iLBP evaluation on a sequence of images when reconfigurable or customary processors are not available and processing speed is crucial.
The evaluation results of the proposed approach are given for the publicly available outdoor scenarios.Achieved speed is more than satisfactory while preserving reasonable loss in overall performance.

Figure 4 .
Figure 4. Detection of moving objects using iLBP background model.Line 1 -frame numbers of "highway" video, line 2 -

Table 1 .
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XL-3, 2014 ISPRS Technical Commission III Symposium, 5 -7 September 2014, Zurich, Switzerland Results for GMM, KDE, STLBP and iLBP for the "highway" video.TP -True Positive, FP -False Positive, FN - False Negative.