The study of motion in computer vision enables us to extract visual information from the spatial and temporal changes occurring in an image sequence. An image sequence is defined to be a series of n images, or frames, acquired at discrete times intervals tk = t0 + k * dt, where dt is a fixed time interval, and k = 0, 1,..., n - 1. Assuming unvarying illumination conditions, changes in an image sequence are caused by a relative motion between the camera and the scene: the viewing camera could move in front of a static scene, or parts of the scene could move in front of a stationary camera, or, in general, both camera and objects could be moving with different motions. Visual motion is important for 2 main reasons. First, the apparent motion of objects onto the image plane is a strong visual cue for understanding structure and 3D motion. Second, biological visual systems use visual motion to infer properties of the 3D world with little a priori knowledge of it. The following simple examples illustrate these points:
Example 1: Random Dot Sequences
Consider an image of random dots, generated by assigning to each pixel a random grey level. Consider a second image obtained by shifting a squared, central region of the first image by a few pixels, say, to the right, and filling the gap thus created with more random dots. Displaying the 2 images in sequence, one after the other at a sufficiently fast rate, will induce the viewer to see a square moving sideways back and forth against a steady background. When the 2 images are, instead, viewed one placed next to the other, such that each eye can see only one image, the impression of a square floating against the background is produced. The latter case can be attributed to depth perception due to the disparity between parts of the 2 images.
Example 2: Computing Time-to-Collision
Consider a planar version of the pinhole camera model, and a vertical bar perpendicular to the optical axis, travelling towards the camera with constant velocity as shown in Fig. 1. It is possible to compute the time, T, taken by the bar to reach the camera only from image information, without knowing either the real size of the bar or its velocity in 3D space.
Let L denote the real size of the bar, V its constant velocity, and f the focal length of the camera. The origin of the reference frame is the projection centre. If the position of the bar on the optical axis is D(0) at time t = 0, its position at a later time t is given by D = D(0) - V * t. From Fig. 1 we see that the apparent size of the bar at time t on the image plane is given by
The motion analysis problem can be divided into 2 subproblems: that of correspondence, and that of reconstruction. The correspondence problem states: which elements of a frame correspond to which elements of the next frame of the sequence? The reconstruction problem states: given a number of corresponding elements, and possibly knowledge of the camera's intrinsic parameters, what can be said about the 3D motion and structure of the observed world? These problems are often solved using optical flow. Before we proceed to discuss more of the motion analysis problem we make the following simplifying assumption:
Assumption
There is only one, rigid, relative motion between the camera and the observed scene, and the illumination conditions do not vary.
The above stated assumption implies that the 3D objects observed cannot move with different motions. This assumption is violated, for example, by sequences of football matches, motorway traffic, or busy streets, but is satisfied by, say, the sequence of a building viewed by a moving observer. This assumption also excludes flexible objects, and nonrigid motions such as stretching and shearing; for example, deformable objects such as clothes moving are ruled out.
The motion field is the 2D vector field of velocities of the image points, induced by the relative motion between the viewing camera and the observed scene. The motion field can be thought of as the projection of the 3D velocity field on the image plane.
Let P = [X, Y, Z]T be a 3D point in the camera reference frame, where the centre of projection is the origin, the optical axis is the Z axis, and f denotes the focal length. The image of a scene point, P, is the point, p, given by
This is the case in which the relative motion between the viewing camera and the scene has no rotational component. Since w = 0, the 2D velocity components reduce to
Motion parallax occurs when 2 3D points project to the same 2D location. Let P, and P' be 2 different 3D points with projections p, and p'. The motion field for p, v consists of its components