nextuppreviouscontents
Next:High-fidelity Audio Transport over Up:Network Communication Previous:Network Communication

Video Transport for Low-latency Human-Human Collaboration

Anyone who has used videoconferencing tools, ranging from simple desktop applications such as CU-SeeMe or Netmeeting, to high fidelity professional systems, quickly realizes that videoconferencing is not the same as physical presence, nor even to a telephone call. While the conversants can see video images of each other, these are often of limited quality. Worse, the latency in the audio signal results in an unnatural ``turn-taking'' style of conversation that diminishes the quality of interaction and exaggerates the sense of distance.

In order to overcome these problems, we are developing a new research facility known as the Shared Reality Environment. The primary goal of this environment is to support the exchange of low-latency, high-fidelity audio and video streams between multiple users in different locations. Satisfying this goal for the video stream presents a number of difficulties.

A first approach is to make use of M-JPEG or MPEG encoded video. The problems here are cost and latency. MPEG hardware tends to be expensive, and while this is less of an issue for M-JPEG, with current technology, either method introduces a minimum of 50 ms latency for compression and decompression, on top of the image acquisition time. Avoiding compression presents the option of transmitting raw data. For high resolution, 30 fps video, this requires massive amounts of bandwidth. Even on a 100 Mbps ethernet, transmission of a single frame of 640x480 at 24 bits takes approximately 100 ms.

The fact that much of the data in a video frame is redundant forms the basis of compression techniques. For example, a static background in a sequence of images may constitute the majority of each frame. Since our goal is to allow users to interact, we may simply remove the background in its entiretiy, and thus reduce encoding and decoding time. The remaining image components, if sufficiently small, may be transmitted as raw data without compression, thereby reducing overall latency. Key to this work is the ability to locate, quickly, an approximate bounding box of a person in a scene.

A. Xu, J. Cooperstock


nextuppreviouscontents
Next:High-fidelity Audio Transport over Up:Network Communication Previous:Network Communication
Annual Report

Mon Jun 26 21:22:20 GMT 2000