A Complexity and Quality Evaluation of Block Based Motion Estimation Algorithms

Motion estimation is a method, by which temporal redundancies are reduced, which is an important aspect of video compression algorithms. In this paper we present a comparison among some of the well-known block based motion estimation algorithms. A performance evaluation of these algorithms is proposed to decide the best algorithm from the point of view of complexity and quality for noise-free video sequences and also for noisy video sequences.


Introduction
Interframe predictive coding is used to eliminate the large amount of temporal and spatial redundancy that exists in video sequences, and helps in compressing them.In conventional predictive coding the difference between the current frame and the predicted frame (based on the previous frame) is coded and transmitted.
The better the prediction, the smaller the error and hence the transmission bit rate.If a scene is still, then a good prediction for a particular pel in the current frame is the same pel as in the previous frame, and the error is zero.However, when there is motion in a sequence, then a pel on the same part of the moving object is a better prediction for the current pel.Use of knowledge about the displacement of an object in successive frames is called Motion Compensation.There are a large number of motion compensation algorithms for interframe predictive coding.In this study, however, we have focused on a single class of such algorithms, called Block Matching Algorithms.These algorithms estimate the amount of motion on a block-by-block basis, i.e. for each block in the current frame a block from the previous frame is found that is said to match this block, based on a certain criterion.There are a number of criteria to evaluate the "goodness" of the match, some of which are: 1. Pixel Difference Classification (PDC), 2. Mean Absolute Difference (MAD), 3. Mean Squared Difference (MSD).
Mean absolute difference (MAD) is the most commonly used cost function, since it does not need a multiplication operation.PDC counts the number of matching pixels between two blocks.
Mathematically these cost functions can be defined as: where, R x i k y j l ( , ) + + + + and C x k y l ( , ) + + are the reference frame's block and the current frame's block respectively and, the motion vector is defined by (i, j).

Full search algorithm (FS)
The simplest method to find the motion vector for each macro-block is to compute the certain cost function at each location in the search space.This is referred to as the full search algorithm.The cost function used in the full search algorithm is the mean absolute difference MAD.The best matching block is the reference block for which MAD(i, j) is minimized, thus the coordinates (i, j) define the motion vector.The main problem of the full search algorithm is the computation complexity, which can be estimated as follows [1].For each motion vector there are (2p+1) 2 search locations.At each location (i, j) N×M pixels are computed.Each pixel comparison requires four operations, namely, a subtraction, an absolute-value calculation, one addition, and one division, if the cost of accessing pixels C x k y l ( , ) + + and R x i k y j l ( , ) + + + + is ignored.Thus the total complexity per macro-block is (2p+1) 2 ×NM×4 operations.Then for frame resolution I×J and frame rate F frames per second, the overall complexity is defined as: For example for typical values for broadcast TV with N = M = 16, I = 720, J = 480 and F = 30 the motion estimation based on the full search algorithm requires 39.85 GOPS (Giga operations per second) for p = 15, and 9.32 GOPS for p = 7.This example shows that the full search algorithm is computationally expensive but guarantees finding the minimum MAE value.Due to the high computational complexity of the full search, alternative search methods are desirable.

Three step search algorithm (TSS)
This algorithm [2] is simple and robust and also provides near optimal performance, so it has become very popular.It searches for the best motion vectors in a coarse to fine search pattern.It can compute displacement up to ±7 pixels.The algorithm may be described as follows: Step 1: An initial step size is chosen.Eight blocks at a distance of step size from the center (around the center block) are picked for comparison.
Step 2: The step size is halved.The center is moved to the point with the minimum distortion.
Steps 1 and 2 are repeated until the step size is equal to 1.One problem that occurs with the Three Step Search is that it uses a uniformly allocated checking point pattern in the first step, which becomes inefficient for small motion estimation.For each motion vector there are (8×3+1) search locations.At each location (i, j) N×M pixels are computed.Each pixel comparison also requires three operations, a subtraction, an absolute-value calculation, and one addition, if the cost of accessing pixels C x k y l ( , ) + + and R x k i y l j ( , ) + + + + is ignored.Thus the total complexity per macro-block is (8×3+1)×NM×4 operations.Then for frame resolution I×J and frame rate F frames per second, the overall complexity is defined as: For example if I = 720 and J = 480 and F = 30 then the overall complexity is equal to 1036.8 MOPS, and this is just 2.6 % of the full search required operations.

Four step search (4SS)
This algorithm [3] is based on the real world image sequence's characteristic of center-biased motion.The algorithm starts with a nine-point comparison and then the other points for comparison are selected on the basis of the following algorithm: Step 1: Start with a step size of 2. Pick nine points around the search window center.Calculate the distortion and find the point with the smallest distortion.If this point is found to be the center of the searching area, go to step 4, otherwise go to step 2.
Step 2: Move the center to the point with the smallest distortion.Step 3: The search pattern strategy is the same, however it will finally go to step 4.
Step 4: The step size is reduced to 1 and all nine points around the center of the search are examined.The computational complexity of the four step search is less than that of the three step search, while the performance in terms of quality is as good.

Multi-resolution algorithms
Spatial multiresolution video sequences provide video at multiple frame sizes, allowing extraction of only the resolution or bit rate required by the user.To illustrate the efficiency of multi-resolution based algorithms [1] in comparison with full-frame based algorithms, assume that the current frame and the reference frame are decomposed into two levels using a simple averaging filter (2×2) twice.Using the FS algorithm in the lowest resolution (level 2), the complexity of the algorithm is as follows.Level 2: assume the parameters for broadcast TV (720×480 at 30 frames per second).Then the picture size in level 2 is 180×120, macroblock size 4×4, and the number of macroblocks is equal to (180×120)/(4×4), equal to 1,350 at 30 frames/second.The searching window will be rescaled to From the complexity point of view the multi-resolution search algorithm is very efficient; however, such a method requires increased storage due to the need to keep pictures at different resolutions.Also, because the search starts at the lowest resolution small objects may be completely eliminated and thus fail to be tracked.On the other hand the creation of low-resolution pictures provides some immunity to noise.

Wavelet based algorithms [5]
An efficient multi-resolution tool is the wavelet transform, so we review a robust algorithms based on the wavelet trans-formation, MRVBS (Multi-resolution Variable Block Size) algorithm.It is based on a central search process in three layers, namely, layer 2, layer 1, and layer 0 (the original frame).MAD is used as a cost function.The main steps are described as follows: first the current frame and the previous frame are decomposed into two layers of the wavelet domain.
Step 1: in layer 2, the central search process is applied on the low band i.e., searching for the best matching within the nine neighboring blocks to get an initial motion vector.The block size used in this step is 4×4, and the estimated motion vector is used as the new center for the central search process for the details.
Step 2: the estimated motion vector in the previous step is rescaled and used as the new center for the three highest bands in layer 1 with block size 8×8.
Step 3: from the estimated motion vectors in step 2, the median values are chosen to be rescaled into layer 0 and then used as a new center to estimate the final motion vector by using block size 16×16.
The computational cost of this algorithm without the wavelet complexity is (36*p 2 +27*p 1 +9*p 0 ), where p 2 , p 1 , and p 0 are the block size in layer 2, layer 1, and layer 0, respectively.

Two dimensional logarithmic search (TDL)
This algorithm was introduced by Jain & Jain [6].Although this algorithm requires more steps than the Three Step Search, it can be more accurate, especially when the search window is large.The algorithm can be described as follows: Step 1: Choose an initial step size as 2 J .Look at the block at the center of the search and the four blocks at a distance s from this on the X and Y-axes.(The five positions form a + sign).
Step 2: If the position of the best match is at the center, halve the step size.If, however, one of the other four points is the best match, then it becomes a new center, and step 1 is repeated.
Step 3: When the step size becomes 1, all the nine blocks around the center are chosen for the search and the best among them is picked as the required block.Many variations of this algorithm exist, and they differ mainly in the way in which the step size is changed.Some people argue that the step size should be halved at every stage.Some people believe that the step size should also be halved if an edge of the search space is reached.

Orthogonal search algorithm (OSA)
This algorithm was introduced by Puri [7] and it is a hybrid of the Three Step Search and the Two Dimensional Logarithmic Search.It has a vertical stage followed by a horizontal stage for the search for the optimal block.Then the algorithm may be described as follows: Step 1: Pick a step size (usually half the maximum displacement in the search window).Take two points at a distance of the step size in the horizontal direction from the center of the search window and locate (among these) the point of minimum distortion.Move the center to this point.
Step 2: Take two points at a distance of the step size from the center in the vertical direction and find the point with the minimum distortion.
Step 3: If it is greater than one; halve the step size and repeat steps 1 and 2, otherwise, halt.

Center-biased orthogonal search algorithm (CBOSA)
The CBOSA algorithm [8] for finding small motion is described below.The CBOSA algorithm is a modification of the orthogonal search algorithm (OSA), which is reviewed in section (2.7).The OSA algorithm has faster convergence, fewer checking points and fewer searching steps.However, the performance of OSA in terms of MSE is much lower than that of 3SS and other fast BMAs.This is because the OSA algorithm does not make use of the center-biased motion vector distribution characteristics of the real world video sequence.In order to tackle this drawback, the CBOSA algorithm uses a smaller step size in the first step so as to increase the probability of catching the global minimum point.
For the maximum motion displacement of ±7 in both the horizontal and vertical directions, the CBOSA algorithm uses three horizontal checking points with a step size of 2 in the first step (Step 1-H).If the minimum BDM (block distortion measure) is at the center, it jumps to the vertical step (Step 1-V).Otherwise, one more checking point is searched in the horizontal direction).This extra step is to make sure that the algorithm can cover the whole search window even using a small step size of 2 in the first step.Using the minimum BDM point found in Step 1-H, Step 1-V uses the same searching strategy as Step 1-H to search in the vertical direction.Then the algorithm jumps to Step 2-H and Step 2-V, respectively.These two steps use three checking points also with a step size of 2 in the horizontal and vertical directions, respectively.
After Step 2-V, the algorithm jumps to Step 3-H and Step 3-V, respectively.Step 3-H and Step 3-V use three checking points with the step size reduced to 1 in the horizontal and vertical directions, respectively.

One at a time algorithm (OTS) [9]
This is a simple, but effective way of trying to find a point with the optimal block.During the horizontal stage, the point on the horizontal direction with the minimum distortion is found.Then, starting with this point, the minimum distortion in the vertical direction is found.The algorithm may be described as follows: Step 1: Pick three points about the center of the search window (horizontal).
Step 2: If the smallest distortion is for the center point, start the vertical stage, otherwise look at the next point in the horizontal direction closer to the point with the smallest distortion (from the previous stage).Continue looking in that direction till you find the point with the smallest distortion.(Going in the same direction, the point next to it must have a larger distortion).
Step 3: Repeat the above, but taking points in the vertical direction about the point that have the smallest distortion in the horizontal direction.
This search algorithm requires very little time; however the quality of the match is not very good.

Cross search algorithm (CSA)
This algorithm was introduced by M. Ghanbari [10].The basic idea in this algorithm is still a logarithmic step search.However, the main difference between this and the logarithmic search method presented above is that the search locations picked are the end points of an "x" rather than a "+".The algorithm may be described as follows: Step 1: The center block is compared with the current block and if the distortion is less than a certain threshold, the algorithm stops.
Step 2: Pick the first set of points in the shape of an "x" around the center.(The step size picked is usually half the maximum displacement).Move the center to the point of minimum distortion.
Step 3: If the step size is greater than 1, halve it and repeat step 2, otherwise go to step 4.
Step 4: If in the final stage the point of minimum distortion is the bottom left or the top right point, then evaluate the distortion at 4 more points around it with a search area of a "+".If, however, the point of minimum distortion is the top left or bottom right point, evaluate the distortion at 4 more points around it in the shape of an "x".

Performance comparison of the motion estimation algorithms
In this section we will introduce a comparison between some of the most efficient algorithms from different points of view.The well known FS, 4SS, 3SS, OSA, CBOSA, OTS, and CSA algorithms compared from the PSNR point of view as well as the are complexity point of view.The comparison is performed for noise-free sequences as well as noisy sequences with different SNR.

Complexity point of view
FS algorithm: the FS algorithm searches for the best matching within a large window [-p:p]×[-p:p].This means that it searches for the best matching within a (2p+1)2 block.Thus for the simplest cost function MAE three operations are performed, namely, one addition, one absolute computation, and one for subtraction, then the total operation number for just one block matching is equal to 4NM(2p+1) 2 , where N and M are the block size, and the total operation number per frame is 4IJ(2p+1) 2 where I, and J are the frame size.This is too many operations, and requires very high speed processors.
TSS algorithm: the TSS algorithm searches for the best matching within [-p:p]×[-p:p] window.Here p is equal to 7, but only blocks in this window with a certain step are checked.The total number of checked blocks is 25.This means that the total operation per frame is 75 (IJ), so it requires just 2.6 % of the operations required for the FS algorithm (with p = 15).Note that data access is not taken into consideration.
4SS algorithm: in 4SS certain conditions are inserted for jumping between steps to overcome computation overlap.The total number of checked blocks varies between the maximum value (27) and the minimum value (17).On an average it requires 22 blocks to be checked.This means that the total operation per frame is 66 (IJ), and it requires just 2.289 % of the operations required for FS the algorithm.
CSA algorithm: for a maximum displacement of ±7 the CSA algorithm requires (5+4+4) = 13 checking points it can be formulated in a general form as 5+4*log 2 W, where W is the initial step size.For example, it is chosen to equal 4 for a maximum displacement of ±7.
OTS algorithm: the OTS algorithm is very attractive from the computation point of view.The number of checking points required by the OTS algorithm varies from (3+2) = 5 to (3+1+1+1+1+1+1+2+1+1+1+1+1+1) = 17, and the number of checking points may take the values 5, 6, 7, 8, …17.An advantage of this algorithm is that it adds only one checking point at a time till reaching the minimum distortion.

Quality point of view
In this section we introduce the simulation results for a comparison between some well known algorithms.In the simulation we used two different techniques to search for the best matching blocks: 1. Searching within non-overlapped blocks in the search area, as in Fig. 1. 2. Searching within overlapped blocks, as in Fig. 2.
It is clear from Figs. 1, 2, that the difference between the overlapped and non overlapped block technique is that the displacement in the overlapped blocks is in the pixels while in the non overlapped blocks it is an integer number of the block size.Thus for the same complexity the searching window is (2*p+1) 2 *N 2 , and (2*p+1) 2 pixels for non overlapped and overlapped techniques, respectively.These two searching windows give the same searching points, and consequently the same complexity.
The comparison between different algorithms in this section will indicate the effect of three major factors in motion estimation algorithms.
1) The cost function.
2) The block size.
3) The addition of external noise.The effect of these factors is simulated with almost all the motion types.

The effect of the cost function
The cost function is one of the major factors that affect the complexity of the motion estimation algorithm and consequently its performance.In this section two well known and widely used cost functions are compared from the point of view of complexity and also the effect on the performance of different algorithms: Mean Square Difference (MSD): to execute an MSD equation four operations have to be performed, namely; one subtraction, one addition, one squaring operation (multiplication), and one division.These operations are performed in addition to the data accessing.
Mean Absolute Difference (MAD): MAD also requires four operations; one subtraction, one absolute value computation, one addition, and one division, plus data accessing.It is clear that MAD is simpler than MSD, because an absolute evaluation operation rather is required, than the squaring operation.MAD is therefore preferable to MSD from the complexity point of view.
Tables 1, 2 present a comparison between MSD and MAD cost functions for different algorithms and with different noise-free video sequences from the PSNR point of view, using overlapped blocks and non-overlapped blocks, respectively.Fig. 3 shows an example of the effect of the cost function.In this example the TSS algorithm is used, with a constant block size for the two cases (16×16), and using the overlapped blocks technique.This example illustrates that at first MSD achieves better quality than MAD (higher PSNR), but the improvement in the PSNR is small in comparison with the increase in complexity.We can therefore conclude that MAD is better than MSD, when complexity and quality are traded off.

The effect of adding external noise
Video sequences are usually not pure.Some noise almost always corrupts the sequences.Noise may come from the camera (this is called camera noise), or it may be from the transmission lines.The algorithm is therefore required to be robust against the addition of noise.In this section the robust- ness of some algorithms is tested.Here we used white Gaussian noise, with SNR of 25 db and 20 db.The results are shown in tables 3, 4, 5, and 6.Also, an example to indicate the relation between noise (SNR) and quality (PSNR) is shown in Fig. 4. In this example FS, TSS, TDL, and 4SS are used to compare their robustness to noise.Another example is shown in that figure to indicate the robustness of one algorithm (TSS chosen) with a different video sequence.These two examples show that as the noise increases the quality decreases, and the FS algorithm is the best even with addition of heavy noise.
The TSS algorithm is the second best algorithm for noisy seqences.

The Effect of Block Size
The choice of macroblock size or simply block size (NxM) is the result of tradeoffs among three conflicting requirements.Specifically, 2. Small values for N and M reduce the reliability of the motion vector, since few pixels participate in the matching process; 3. Fast algorithms for finding motion vectors are more efficient for larger values of N and M.
In this section we will show the effect of the block size on the performance of the algorithms.In the simulation, different block sizes (4×4, 8×8, and 16×16) are compared using both the overlapped block and non-overlapped block techniques.MAD is used as a cost function, and the searching win-   7 and 8. Two different examples indicating the block size effect are shown in Fig. 5.
These two examples show that the PSNR decreases as the block size increases.The increase in PSNR is at the cost of increasing the computation time.

Visual results
In this section the reconstructed frames will be presented with the use of FS and TSS algorithms and with MSD as a cost function and for the overlapped technique.For comparison we used three video sequences, specifically; Claire video sequence: this represents a head and shoulder sequence, and has just one moving object with slow motion.
Mother & Daughter sequence: this also represents a head, and shoulder sequence, but it has just two moving objects with slow motion.
Football sequence: this represents a multi-object sequence with fast motion.
For these three sequences the reconstructed frame is shown, the comparison is performed by estimating frame number n+k from a reference frame n.For each sequence three cases are performed with k = 1, k = 4, and k = 7.This is shown in Fig. 6 and Fig. 7.The PSNR corresponding to these cases is shown in Table 9.The simulation results show that the quality of the reconstructed frame decreases as the number of skipped frames increases (k).Appearing and disappearing of objects during the sequence also decreases the quality of the reconstructed frames.(e) reconstructed frame 5 using the FS algorithm, (f) reconstructed frame 8 using the FS algorithm, (g) reconstructed frame 2 using the TSS algorithm, (h) reconstructed frame 5 using the TSS algorithm, (i) reconstructed frame 8 using the TSS algorithm

Conclusion
From the simulation results we can conclude that: l There are two techniques for searching for the best matching, namely; 1) searching within non-overlapped blocks, 2) searching within overlapped blocks.
l A comparison between these two techniques was performed using the same searching algorithm, the same block size, the same cost function, and with the same complexity, i.e. searching points.The simulation indicates that searching within the overlapped blocks is the better from the quality point of view.
l The full search algorithm is the best algorithm from the quality point of view, but from the computation time (complexity) point of view it is the worst.
l The TSS algorithm is the best algorithm from the trade off quality -complexity point of view.
l The block size is one of the effective factors in the motion estimation algorithms.Small block size (such as 4×4 and 8×8) results in good quality, but reduces the reliability of the motion vector, since few pixels participate in the matching process.On the other hand, large block sizes (such as 16×16) are preferable for fast algorithms.
l The cost function affects the complexity of the searching algorithm.A comparison between MAD and MSD cost functions indicates that MSD achieves greater quality than MAD at the cost of increasing the complexity.MAD is preferable, since the difference in quality is very small.
l The addition of white Gaussian noise affects the direction of the motion vectors; consequently the reconstructed frame has less quality.structed frame 5 using the FS algorithm, (f) reconstructed frame 8 using the FS algorithm, (g) reconstructed frame 2 using the TSS algorithm, (h) reconstructed frame 5 using the TSS algorithm, (i) reconstructed frame 8 using the TSS algorithm given threshold t, T k l C x k y l R x i k y j l t i j ,

Fig. 3 :
Fig. 3: The effect of the cost function

Fig. 4 :Fig. 5 :
Fig. 4: The effect of noise on different algorithms, with different sequences

Table 1 :
Czech Technical University in Prague Acta Polytechnica Vol. 45 No. 1 /2005 The effect of the cost on different algorithms using the overlapped block technique

Table 2 :
Polytechnica Vol. 45 No. 1 /2005 Czech Technical University in Prague The effect of the cost on different algorithms using the non overlapped block technique

Table 3 :
The effect of the adding gaussian noise with SNR = 25 db on different algorithms using the overlapped block technique.

Table 4 :
The effect of the adding gaussian noise with SNR = 20 dB on different algorithms using the overlapped block technique 1. Small values for N and M (from four to eight) are preferable, since the smoothness constraint would be easily met at this resolution;

Table 5 :
The effect of adding gaussian noise with SNR = 25 dB on different algorithms using the non overlapped block technique

Table 6 :
The effect of adding gaussian noise with SNR = 20 dB on different algorithms using the non overlapped block technique

Table 7 :
The effect of block size on different algorithms using the non overlapped block technique

Table 8 :
The effect of the block size on different algorithms using the overlapped block technique

Table 9 :
The PSNR performance for two algorithms