3D motion estimation comparison

Comparison of three methods for estimation of 3D object motion

Goal

This demonstration is on automatic analysis of moving objects in a video sequence. The goal is the estimation of the 3D motion between two time instants of a 3D rigid object.

Approaches

Three methods are compared:

1. Motion from Point Correspondences:

The first method assumes

2D point correspondences in two succeeding images are known
2D object segmentation mask is known
Object is rigid
Camera can be approximated by orthogonal projection

and estimates 3D motion by an approach based on epipolar geometry. In the special case of orthogonal projection, only one of 6 3D rigid motion parameters can be estimated: The object rotation axis tilt that defines the orientation of the rotation axis projected into the image plane [LeSaux].

The experiments are carried out fully automatically. The point correspondences are estimated by a choice of reliable and spatially well distributed vectors from a dense displacement vector field. The 2D objects segmentation mask is derived based on change detection by the public software COST 211 Analysis Model [COST].

2. Motion from Shape:

The second method assumes:

3D object shape is known
2D object segmentation mask is known
Object is rigid
Scene illumination is diffuse
Camera can be approximated by orthogonal projection

and estimates 3D motion by an optical flow approach. Using the known 3D shape, 5 from 6 3D rigid motion parameters are introduced in the optical flow equations. Due to the orthogonal projection assumption, the sixth parameter, the translation in viewing direction, can not be estimated [LeSaux].

The experiments are carried out fully automatically. The 2D objects segmentation mask is derived based on change detection by the public software COST 211 Analysis Model [COST]. A rough approximation to the 3D object shape is calculated from the 2D segmentation mask using a distance transform assuming convex shape.

3. Motion from Shading:

The third method assumes:

Displacement vector field is known
2D object segmentation mask is known
Object is rigid and ball-like
Main scene illumination is in viewing direction
Camera can be approximated by orthogonal projection

and estimates fully automatically the object rotation axis tilt (see first method) by exploiting only object shading [Stau99].

Experimental Results

The experimental results are derived fully automatically. The necessary input data for the algorithms is derived automatically too as follows:

Object segmentation: By COST 211 Analysis Model [COST]
3D shape estimation: Shape is derived by a simple distance transform from the 2D segmentation mask
Point correspondences: By block matching between two images and choice of reliable points
Displacement vector field: By block matching between two images

Original sequence	Motion from points	Motion from shape	Motion from shading

The results visualize the projection of the 3D object rotation axis into the 2D image plane. The estimated motion follows qualitatively the motion of the object for all three methods. The results show that the applied assumptions are in general valid for simple video scenes.

However, the first method (based on point correspondences) shows the highest temporal stability. The second method (based on 3D shape) seems to fail for some images. The third method (based on shading) gives always slightly biased results due to the assumption that the main illumination is in viewing direction.

References

[COST]: COST 211quat Analysis Model
[LeSaux99]: B. Le Saux: "Estimation du mouvement d'un objet avec un modèle de caméra simplifié", rapport de stage de DEA, IRISA/INRIA, Rennes, France, Septembre 1999.
[Stau99]: J. Stauder: "Object Rotation Axis from Shading", ICASSP 1999, 15.-19.3.1999, Phoenix, Arizona, USA.

Up to motion page