Video Stabilization with SIFT

Katie Zutter Olivia Zhao Sam Erickson

Experimentation - Overview

In order to validate and measure the accuracy of our SIFT implementation, a variety of videos containing both intentional and accidental motion will be provided as inputs. There seem to be three main categories of intentional motion:

  • Motion caused by traveling objects being filmed, e.g. cars driving down a street

  • Motion resulting from the camera panning to capture different scene perspectives, e.g. a panning scene of a landscape

  • Motion arising from both moving objects and camera panning, e.g. a panning scene of walking pedestrians

The algorithm will be tested on all three categories, with the goal of measuring the following for each:

  • Frequency of incorrectly shifted frames based on intentional motion rather than shaky, accidental motion

  • Conversely, how often accidental motion is wrongly ignored and treated as intentional motion

Finally, the measurements of each video category will be compared with one another to detect any discrepancies between the algorithm’s performance across the three scenarios.


Proposed Implementation

The algorithm implementation roughly follows the methodology outlined by Battiato, Gallo, Puglisi, and Scellato1. The three SIFT steps laid out in the paper are feature matching, model fitting, and motion filtering.

In the paper, Euclidean distance and a distance ratio are used to match features. This is a proven reliable measurement, but it may be beneficial to experiment with other distance functions in an attempt to improve accuracy. We also examine how changing the threshold for the distance ratio changes the correctness of pixel matches. From feature matching, a set of local motion vectors can be generated and fit to a model.

In order to construct a model, we need to decide what features to calculate and with which to experiment. Commonly utilized object features such as first moments, orientation axes, and roundness will likely be useful, but the introduction of additional features could provide performance gains. Methods to discard incorrect matches based on Euclidean distance and angle of rotation between expected and real points should be investigated.

Finally, motion filtering must be applied to discern between intentional and accidental motion. Global motion vectors and integrated motion vectors can be used in conjunction to filter the frame motion vector as near as possible to the accidental motion vector.