Monocular Optical Flow For Real-Time Vision Systems

STEPHEN M. BENOIT

Graduate Student - Master's

Computer Vision - Artificial Perception Lab


See full image of optical flow field of moving hand.	See full image of optical flow field magnitude and figure/ground separation.

Overview:

While laser rangefinders offer high resolution depth maps, the apparatus is sometimes too expensive for the needs of smaller industry groups. A low - cost alternative is to produce lower-resolution depth maps using a structure - from - motion algorithm with a common video camera and digitizing hardware. An efficient scheme would involve an operator who would present the target object to the video camera and move the object under the computer's suggestions for new views. While the operator manipulates the object, the computer would automatically track the target and produce new pose estimates and depth maps at near real - time. These coarse depth maps can be processed by generalized view integration algorithms to fuse the data into more detailed depth maps.

Shown here is live video (1.4 FPS) overlaid with a red flow field, surrounded by a green tracking window. The magnitude of the flow vectors is rendered at left, brighter pixels indicate faster motion, black indicates no motion. The green blob indicates the result of figure / ground separation.

Equipment:

See full image of hardware setup.

Our philosophy is to use commonly-available, low-cost approaches on workstations. Investing in custom computing power does not always pay off, since the desktop processing power rapidly obsoletes the hand-made machines, and are more maintainable.

100 MHz SGI INDY
MIPS R4600 CPU with MIPS R4610 FPU
32 Mbytes main memory
built-in frame-grabber
pre-installed frame-grabber drivers
Irix 5.2 Operating System

For our experiments, we used off-the-shelf components on a 100 MHz SGI INDY workstation. The video framegrabber is built-in, and the camera perched above the monitor is inexpensive. Although the hardware can accomodate 640x480 8-bit video at 30 frames per second, our software was very demanding. We used video images of 320x240 pixels, and the software grabs the images when it finishes processing frame pairs, typically around 2 FPS. But when tracking is enabled, the software predicts where the object moves, and will limit its attention to that region, ignoring the background. With this strategy, we have sustained 5 FPS.

Visit my home page.
Visit McGill's Center for Intelligent Machines (CIM).
Visit the Artificial Perception Lab.
Send me mail.