Video Stabilization with SIFT

Katie Zutter Olivia Zhao Sam Erickson

Method 1

Before we decided that we would use SIFT, we explored alternate methods of video stabilization. One method that we explored in-depth was the algorithm outlined in the MathWorks tutorial titled Video Stabilization Using Point Feature Matching5, which is based on a paper with the same title by Shamsundar Kulkarni3. The benefit of this method is that there is no need to have prior knowledge of where the salient feature is in the first frame of the video, as compared to methods that track a salient feature in the image and use that features as an anchor point. This is achieved by automatically looking for the background pane in a video sequence.

The first step of this algorithm is to read in two frames of the video and compute the pixel-wise difference between them. The following figure illustrates the pixel-wise difference by showing a red-cyan color composite, the first frame being red and the second frame being cyan.

Figure 1. Pixel-wise difference between two frames

The next step is to generate points of interest in both frames to determine the likely pairings of points of interest between frames. This is done by using Matlab’s detectFASTFeatures method. Then, to find the corresponding points by using extractFeatures; for each point, a Fast Retina Keypoint (FREAK) is extracted and points are matched using the Hamming distance. The following figures show corresponding points between the first and second frames. Points of interest in the first frame are displayed as red circles while points of interest in the second frame are displayed as green plus signs. The close-up image shows the lines between points from frames which represent a pair of corresponding points.

Figure 2. Points of interest in two frames

Figure 3. Close-up of distance between points of interest in two frames

The next step is to eliminate noisy correspondences, much like using RANSAC to eliminate noisy corresponding points in panorama images in HW4. Instead of RANSAC, this algorithm utilizes a variant of RANSAC called MSAC (M-estimator SAmple Consensus). This is done by calling a Matlab function estimateGeometricTransform.

Finally, smoothing is applied and the full video is rendered. We were able to successfully replicate the algorithm and run it on a shaky video of a train drawn from YouTube. However, we ultimately determined that we were not going to use this method of image stabilization; see the Difficulties section for issues that we encountered which informed the decision to continue exploring alternate methods. Although the exploration of this method ended up to be a detour of our work, it was worthwhile as it gave us insight into the general intuition behind implementing image stabilization in Matlab and the available functions in the Image Processing Toolbox.

Figure 3. Screenshot of one frame in the stabilized video

Method 1 - Results

Below is the comparison between the original video and the optimized video: