Video Stabilization with SIFT

Katie Zutter Olivia Zhao Sam Erickson


Method 2


In attempt to implement the Scale Invariant Feature Transform (SIFT) algorithm to track the motion of features between consecutive video frames, we draw ideas from Grundmann, Kwatra, & Essa2. In their article, they introduced a video stabilization method that first extracts SIFT features, then generates stabilized video by removing unwanted motions. To achieve that, they minimized the first, second, and third derivative of the resulting camera paths. The method does not need reconstruction of 3D scene, and therefore reduce computing power.

In our project, we started by breaking a short shaky video into 1000 frames, and then extract SIFT features among those frames. In the meantime, we also extracted the camera path among the video frames. After we had extracted SIFT features, we optimized the camera path by cropping out any unwanted pixels due to shaky motions, minimizing the first, second and third derivative of the camera paths as described in [2], and lastly warping each frame with the same size. Figure 4 shows the original camera path in the red line, and resulting in optimized camera path in the green line. We can see that in the original videos, the camera path was very unstable. After stabilization, we get an almost perfectly stabilized view.  After each frame has been transformed, we then convert 1000 resulting frames into a stable video. Figure 5 is an example that comparing the resulted frame (on the right) with the original frame (on the left). The resulting effect was more evident in video format, and we are planning on submitting the videos in the final project.


In the case of regular video stabilization, the algorithm performed very well. Especially in the horizontal trajectory (x-coordinates), the algorithm managed to perfectly stabilize the path. In the vertical case, the y-coordinates were a little shaken at the very end. We believe that it is because at the end of the video, the object (train head) was leaving the scene, which confused the algorithm so that it needed to recalibrate.


Figure 1. Original camera path in red lines, Optimized camera path in green lines

Method 2 - Video Results

Below are the comparison between original videos and optimized videos:


Method 2 - Further exploration

We further explored the algorithm with 360 degree videos. Nowadays 360 degree video cameras have become popular for tourism, sports and other activities.

Results from 360 degree video

Using the same algorithm, below are plots that illustrates comparisons between original camera path and optimized path.

As we can see, in the x-coordinates, the algorithm chose not to correct for any shaky motions. We think that this might result from the fact that we treated the video as regular video frames, therefore ignoring the 360 metadata.

In the y-coordinates, the resulting path did not generate a fully flat trajectory. However, it corrected most of the very shaky motions.


Below are the comparison videos: