|
| HOME | PERSONNEL | PROJECTS | PUBLICATIONS | FAQ | INTERNAL |
|
|
Current projects are divided into broad categories of
multidisciplinary collaborations,
high-fidelity communications,
human-computer interaction, and
camera-projector systems.
Open Orchestra
The orchestral training of professional and semi-professional musicians
and vocalists requires expensive resources that are not always available
when and where they are needed even if the funding for them were made
available. What is needed is the musical equivalent of an aircraft
simulator that gives the musician or vocalist the very realistic
experience of playing or singing with an orchestra. The purpose of
making this experience available through a next generation
network-enabled platform is to provide the extensive tools and resources
necessary at very low cost and wherever there is access to a high speed
network.
Health Services Virtual Organization
The HSVO aims to create a sustainable research platform for experimental development of shared ICT-based health services. This includes support for patient treatment planning as well as team and individual preparedness in the operating room, emergency room, general practice clinics, and patients' bedsides. In the context of the Network-Enabled Platforms program, the project seeks to offer such support to distributed communities of learners and health-care practitioners. Achieving these goals entails the development of tools for simultaneous access to the following training and collaboration resources: remote viewing of surgical procedures (or cadaveric dissections), virtual patient simulation involving medical mannequins and software simulators, access to 3D anatomical visualization resources, and integration of these services with the SAVOIR middleware along with the Argia network resource management software.
Underwater High Definition Video Camera Platform
We are working on the development of Web services software that
matches a common set of underwater video camera control inputs and
video stream outputs to the bandwidth available to a particular
scientist and allows scientists to collaborate through sharing the
same underwater view in real time. We will then produce a web-based
video camera user interface that makes use of the controls and
features available through these web services. In addition, we will
test an existing automated event detection algorithm for possible
integration into the "live" system.
3D Visualization and Gestural Interaction with Multimodal Neurological Data
This project deals with the challenges of medical image visualization,
in particular within the domain of neurosurgery. We wish to provide
an effective means of visualizing and interacting with data of the
patient's brain, in a manner that is natural to surgeons, for
training, planning, and surgical tasks. This entails three
fundamental objectives: advanced scientific visualization, robust
recognition of an easily learned and usable set of input gestures for
navigation and control, and real-time communication of the data
between multiple participants to permit effective understanding and
interpretation of the contents. The required expertise to accomplish
these tasks spans the areas of neurosurgery, human-computer
interaction, image processing, visualization, network communications.
Mobile Game Device for Amblyopia Treatment
Amblyopia is a visual disorder affecting a significant proportion of
the population. We are developing a prototype device for assessment
and treatment of this symptom, based on a modified game application
running on a compact autostereoscopic display platform. By sending a
calibrated "balanced-point" representation to both eyes, we aim for a
therapeutic process to gradually engage signals from the weaker eye to
engage it in the visual process. The adaptation of this approach from a
lab-based and controlled environment to a portable device for daily
use has the potential to make amblyopia treatment more accessible.
Enhanced Virtual Presence and Performance
This project will enhance the next generation of virtual presence and
live performance technologies in a manner that supports the
task-specific demands of communication, interaction, and
production. The goals are to: improve the functionality, usability,
and richness of the experience; support use by multiple people,
possibly at multiple locations, engaged in work, artistic performance,
or social activities; and avoid inducing greater fatigue than the
alternative (non-mediated) experience. This work builds on recent
activities in Shared Spaces and the World
Opera Project.
World Opera
Can opera be performed if the opera singers are standing on different
stages in different time zones in different countries? This question
is at the heart of the World Opera
Project, a planned joint, real-time live opera performance to take
place simultaneously in several Canadian, U.S. and European
cities. The project is envisioned as a worldwide opera house located
in cyberspace.
Adaptive streaming for Interactive Mobile Audio
This work involves evaluation of audio codec quality in the context of
end-to-end network transmission systems, development of adaptive
streaming protocols for wireless audio with low latency and high
fidelity characteristics, and testing of these protocols in real-world
settings. Our freely downloadable streaming engine,
nStream
is available for Linux, OS X, and Gumstix platforms.
Augmented Reality Board Games
Natural Interactive Walking (aka Haptic Snow)
This project is based on the synthesis of ground textures to create
the sensation of walking on different surfaces (e.g. on snow, sand,
and through water). Research issues involve sensing and actuation
methods, including both sound and haptic synthesis models, as well as
the physical architecture of the floor itself. A working prototype
has been developed and is currently being refined.
Audioscape: Mobile Immersive Interaction with Sound and Music
This project involves the creation of a compelling experience of
immersive 3D audio for each individual in a group of users, located in
a common physical space of arbitrary scale. The architecture builds
upon our earlier immersive real-time audiovisual framework: a modeled
audio performance space consisting of sounds and computational sound
objects, represented in space as graphical objects. Current and
planned activities include experimentation with different technologies
for low-latency wireless audio communication, a large-scale augmented
reality environment to support immersive interaction, and embedding of
3D video textures (e.g. other human participants) into the displayed
space. This work is being conducted in collaboration with Zack Settel
and Mike Wozniewski.
User interface paradigms for manipulation of and interaction with a 3D
audiovisual environment
We would like to develop an effective interface for object
instantiation, position, view, and other parameter control, which
moves beyond the limited (and often bewilderingly complex) keyboard
and mouse devices, in particular within the context of
performance. The problem can be divided into a number of actions (or
gestures that the user needs to perform), the choice of sensor (to
acquire these input gestures), and appropriate feedback (to indicate
to the user what has been recognized and/or
performed).
Evaluation of Affective User Experience
The goal of this project is to develop and validate a suite of
reliable, valid, and robust quantitative and quqlitative, objective
and subjective evaluation methods for computer game-, new media-, and
animation environments that address the unique challenges of these
technologies. Our work in these area at McGill spans biological and
neurological processes involved in human psychological and
physiological states, pattern recognition of biosignals for automatic
psychophysiological state recognition, biologically inspired computer
vision for automatic facial expression recognition, physiological
responses to music, and stress/anxiety measurement using physiological
data.
Camera-Projector Systems
Virtual Rear Projection
We transform the walls of a room into a single logical display using
front-projection of graphics and video. The output of multiple
projectors is pre-warped to correct misalignment and the intensity
reduced in regions where these overlap to create a uniformly illuminated
display. Occlusions are detected and compensated for in real-time,
utilizing overlapping projectors to fill in the occluded region, thereby
producing an apparently shadow-free display. Ongoing work is
aimed at similar capabilities without any calibration steps as well as
using deliberately projected graphics content on the occluding
object to augment interaction with the environment.
Efficient Super-Resolution Algorithms
Super-resolution attempts to recover a high-resolution image or video
sequence from a set of degraded and aliased low-resolution ones. We
are working on efficient preconditioning methods that accelerate
super-resolution algorithms without reducing the quality of the
results achieved. These methods apply equally to image restoration
problems and compressed video sequences, and have been demonstrated to
work effectively for rational magnification factors.
Dynamic Image Mosaicing with Robustness to Parallax
Image mosaicing is commonly used to generate wide field-of-view results
by stitching together many images or video frames. Existing methods
are constrained by camera motion model and the amount of overlap required
between adjoining images. For example, they cope poorly with parallax
introduced by general camera motion, translation in non-planar scenes,
or cases with limited overlap between adjacent camera views. Our research
aims to resolve these limitations effectively to support real-time video
mosaicing at high-resolution.
Dynamic View Synthesis
Acquiring video of users in a CAVE-like environment and regenerating it
at a remote location poses two problems: segmentation, the extraction of
objects of interest, i.e. people, from the background, and arbitrary view
generation or view synthesis, to render the video from an appropriate
virtual camera. As our background is dynamic and complex, naive
segmentation techniques such as blue screening are inappropriate. However,
we can exploit available geometric information, registering all background
pixels with the environment empty and then, during operation, determine
whether each pixel corresponds to the background through color consistency
tests. Our view synthesis approach is to build a volumetric model through
an efficient layered approach, in which input images are warped into a
sequence of planes in the virtual camera space. For each pixel in each
plane, we determine its occupancy and color through color consistency,
using this to compose the novel image in a back-to-front manner.
Machine Learning Techniques for Closed-Loop Gestural Interaction
This project seeks to model the dynamics of movement for the purpose of sensory motor interaction design. The goal is to learn continuous models of movement or gesture, capturing the most salient features of the dynamics as well as the normative ranges of variability, and to do so in a way that facilitates using the movement models in closed loop interaction. The idea is to facilitate the acquisition and use of internal models of the dynamics in question on the part of users.
Two main approaches are being explored: The learning of movement primitives by a kind of parametric semi-Bayesian nonlinear dynamical system (based on the Dynamic Movement Primitives of Ijspeert, Schaal, and Nakanishi), and the modeling of movement by nonparametric Bayesian dynamical systems. The novel aspect is the tight integration of statistical models with nonvisual feedback designed to aid interaction.
Undersea Window - High Definition Video Online
The "Undersea Window" will transmit live full broadcast
high definition video from a camera on the undersea VENUS
network, 100m below the surface of the Saanich Inlet on Vancouver
Island, to scientists, educators and the public throughout Canada
and around the world via CA*net 4 and inter-connected broadband
networks. The project will serve as a test bed for subsequent high
definition video camera deployment on the NEPTUNE network in the
Pacific Ocean. This work is being conducted in collaboration with
John Roston, Colin Bradley (UVic), and Emmett Gamroth (UVic).
High-Resolution Video Synthesis from Mixed-Resolution Video
To increase the frame rate at high resolution of CMOS image sensors,
we propose using their non destructive read-out capabilities to
simultaneously generate high-resolution frames H at frame
rate h and low-resolution frames L at frame rate l
> h. Our method applies an image-processing algorithm to
both sequences in order to synthesize a high-resolution video
sequence S, at high frame rate l, containing the
high-resolution details and the low-resolution motion dynamics. A
motion evaluation algorithm is used to evaluate pixel motion in a
coarse manner between the last interpolated (synthesized) high-resolution
frame St-1 and the current low-resolution frame
Lt generated by the camera.
Automated Door Attendant
The ADA is an interactive agent that
serves the role of a simplified secretary, tailored for a university
environment. The agent greets visitors, with a "talking head," takes
messages, schedules appointments, and allows the browsing of selected
documents. Components includes a video monitor, speaker, microphone,
and camera. The attendant is presently being augmented with an
animated face that allows for dynamic control of its movement in order
to simulate the acts of speaking, turning to look in the direction of
a visitor, and even yawning. We wish to carry out such control of the
head as appropriate to the activity currently taking place.
Peripheral Communications
We consider two problems related to communication between
geographically distributed family members. First, we examine the
problem of supporting peripheral awareness, in order to improve both
emotional well-being and awareness of family activity. This is based
on a field study to determine the role and importance of various
peripheral cues in different aspects of everyday activities. The
results from the study were used to guide the design of our proposed
augmented communications environment. Second, we consider the choice
of mechanism to facilitate the on-demand transition to foreground
communication in such an environment. The design suggests an expansion
of Buxton's taxonomy of foreground and background interaction
technologies to encompass a third class of peripheral communications.
This work is being conducted in collaboration with Yosuke Kinoe.
Disparity from contour for object segmentation with occlusion
A new disparity-based segmentation method is proposed that explores
the static 3D geometry of a background, and produces
disparity-embedded object contours which can be used to separate
objects via a multi-histogram scheme. This method does not require
identical cameras or frame by frame full stereo reconstruction. It has
low computational cost and can be applied to various vision
applications that require object segmentation as a first step
processing. The experiment results show that the proposed method is
able to segment multiple objects despite occlusions.
Hierarchical Image Coding and Region of Interest Selection
We are developing low-complexity hierarchical encoding algorithms
that provide modest data reduction at low cost for transmission over
computer networks. A key feature is that the encoding is progressive,
permitting truncation of the data stream at an arbitrary position with
reduction in image quality rather than loss of content. On a related
theme, we note that transmission of the entire data content of a video
stream does not take into account the potentially diverse interests or
capabilities of heterogeneous clients nor the relative importance of
different components of the scene. Assuming operation on a multicast
network, the challenge here is to ensure that individual client requests
are balanced against overall system constraints, such as total available
server bandwidth and limit of multicast channels. Our long-term goal
is for such region selection to be automated with the assistance of
intelligent agents, possibly given some hints from the user, for example,
"I'm interested in this person's face" or "follow that object."
Interaction Paradigms in a Large Screen Environment
Virtual interaction metaphors for two-handed control have been studied in
the past primarily in terms of speed and efficiency. We concentrate our
analysis instead on the cognitive effects such metaphors have on users
within a large screen environment. Based on a series of experiments we
determine how best to manage the division of labour between hands in
order to minimize conceptual error. Empirical evidence suggests that the
proficiency of bimanual paradigms, such as toolglasses or pieglasses,
varies according to a number of factors, for instance the amount of
effort required by the non-preferred hand.
Parsing and Interpreting Gestures in a Multimodal Virtual Environment
Human-computer interaction based on the traditional input mode of
keyboard and mouse fails to scale to the demands of large immersive
environments, where users may be standing and moving about the space.
Instead, we propose a gestural interaction paradigm in which users employ
physical gestures to commmunicate their intentions. We are developing a
framework for the acquisition and parsing of such gestures, using input
from either video camera, data glove, or computer mouse (as a prototype).
The architecture is fully configurable through XML files and uses a
common data type in order to facilitate integration with other software
components distributed over the network.
Statistical Multi-Object Tracking
We are developing a generic object tracker capable of following,
in real-time, multiple objects in a dynamic, real-world, possibly
cluttered environment, in which lighting levels can change dramatically,
for example, a classroom where the instructor walks in front of a
projection screen. Our tracker uses a combination of movement
detection and statistical feature extraction to locate and maintain
objects within the camera's field of view. A final step matches
the various features found in the current image with the objects
previously identified by the system.
Hand and Fingertip Tracking for Gesture Recognition
In augmented reality environments, traditional input interfaces such
as the keyboard-mouse combination are no longer adequate. We turn,
instead, to gestural language, long an important component of human
interaction, employing computer vision techniques to perform hand
tracking and gesture recognition. Our approach employs edge detection
for foreground segmentation and tracks the wrist location with a particle
filter. Based on the wrist location and orientation, we then determine
the positions of the fingertips, exploiting their semi-circular shape by
modelling the fingertip extremities as a circular arc. The fingertips can
be located by looking for maximal responses of a circular Hough transform,
applied to the hand boundary image, followed by several heuristic tests
to filter out false positives and duplicate detection.
Stochastic Parsing with Semantic Constraints in Multimodal Interaction
This project uses typed feature structures and syntactic/semantic
constraints to interpret user actions through arbitrary modes such as
speech, gesture, and handwriting. To this end we have developed a unique
parsing algorithm that takes advantage of this approach to search
through partially specified hierarchical descriptions of user activity.
This algorithm is the core of a larger multimodal framework that can
generically incorporate many existing techniques in multimodal
interaction such as temporal constraints, prosodic effects, and dialogue
management. We intend to demonstrate these capabilities in a handful of
applications, among them a simple multimodal game and a multimodal map
navigation system.
Parallel Distributed Camera Arrays
To provide more robust and efficient object tracking for Intelligent
Environments, we are working with colleagues to create a set of
networked low-cost camera arrays that collectively provide high
resolution and large field-of-view image processing capabilities.
Our approach involves the development of a number of novel technologies,
such as smart cameras with on-board reconfigurable image processing
and network communication capabilities, techniques for cooperative
parallel distributed image processing that are suitable for
multi-camera image data, and techniques for reconstruction of
arbitrary viewpoints from a network of video cameras viewing a
scene. Our present efforts are aimed at developing algorithms to
support an array of cameras for parallel distributed processing of
image sequences. This involves synchronized video acquisition,
monocular processing of the individual images, stereo processing
of nearby pairs, matching and triangulation for depth extraction,
and finally, integration of the stereo information from multiple
pairs to generate a rich model of the objects.
Camera Calibration Methods
We conducted a thorough study investigating the effects of training
data quantity, pixel coordinate noise, training data measurement
error, and the choice of camera model on camera calibration results.
The study includes a detailed comparison of various camera models, in
order to determine the relative importance of the various radial and
decentering distortion coefficients. While Tsai's world-reference based
method yielded the most accurate results when trained on data of low
measurement error, this, however, is difficult to achieve in practice
without an expensive and time-consuming setup. In contrast, Zhang's
planar calibration method, although sensitive to noise in training data,
requires only relative measurements between adjacent calibration points,
which can be accomplished accurately with trivial effort, suggesting
that in the absence of sophisticated measurement apparatus, this may
easily outperform Tsai's method.
Recording Studio that Spans a Continent
On Saturday September 23, 2000, a jazz group performed in a concert hall at
McGill University in Montreal and the recording engineers mixing the
12 channels of audio during the performance were not in a booth at the
back of the hall, but rather in a theatre at the University of Southern
California in Los Angeles.
Intelligent Classroom Project
Classroom presentation technology was augmented with sensors, wired
to computers for context-sensitive processing. Now, rather than
require manual control, the room activates and configures the
appropriate equipment automatically, in response to instructor
activity. For example, when an instructor logs on to the computer,
the system infers that a lecture is being started, automatically
turns off the lights, lowers the screen, turns on the projector,
and switches the projector to computer input. The simple act of
placing an overhead transparency on the document viewer causes the
slide to be displayed and the room lights adjusted to an appropriate
level. Similarly, audiovisual sources such as the VCR or laptop
computer output are displayed automatically in response to activation
cues. Together, these mechanisms assume the role of skilled
operator, taking responsibility for the low-level control of
the technology, thereby freeing the instructor to concentrate on
the lecture itself, rather than the user interface.
RoboCup Legged Competition
From 1999 through 2002, McGill was the only Canadian university
and one of only four North American schools to participate in the
Sony Legged league of the
RoboCup Competition. This competition pitted our Sony legged robots against
teams from other universities in a "cat-eat-cat" test of artificial
intelligence and soccers skills.
Phidgets Interface
Based on the work of Greenberg and Fitchett, a project group designed and
prototyped an elegant, USB-based I/O system to allow for easy and rapid
development of software that interfaces to analog and digital inputs,
digital outputs, and stepper motor control. The software environment
surrounding this system was initially limited to running under Visual
Basic on Windows systems but we are now extending the libraries with
more advanced graphical capabilities and porting the system to Linux.
GraffitiBoard
The GraffitiBoard is a wall-sized computer display that tracks the
position of a pointer (such as a user's finger) and displays the
resulting penstrokes as if the user were writing on the wall. A
video projector produces the displayed image while a video camera
captures the users' actions. By applying a simple colour tracking
algorithm or a more complex cross-correlation technique, it is
possible to recognize certain actions and respond accordingly. For
example, if the user's hand is placed on the wall, a palette with various painting options
can be generated at that location. For our demostration program,
we use both colour tracking and correlation techniques to track the
movement of user's finger and draw
and pictures and letters.
This project uses speech recognition software and video overlay text
messages to provide an intuitive VCR interface. Current projects
include rebuilding a perl script that generates an electronic TV guide
from the web, improving the grammar to deal with context-sensitive help,
and running a formal experiment comparing the UbiVCR with other VCR-programming
methods. The results of this experiment are to be submitted
to an HCI conference for publication. An offshoot of this work
is a custom speech-based channel selection and inquiry mechanism
for a near-blind individual.
Millenium Exhibit
This project involved the development of two components of a
fictitious house of the future for the Ontario Science Center.
The exhibit consists of a dining room
and living room scenario. Each room
reacts to user activity, utilizing information from video cameras,
voice recognition, and various low-level sensors, providing output
through synthesized speech, audio and video clips.
Reactive Room
This project (1993-1995) developed a state of the art videoconferencing facility, augmented with various
sensors, which reacted to user activity by automatically selecting
appropriate configurations of audio and video sources. The system
infers the intentions of users and reacts accordingly, allowing
them to conduct both local and videoconference meetings, making
full use of the presentation technology (document camera, VCR,
digital whiteboard) without needing to interact with the computer.
Adaptive File
Distribtion Protocol
AFDP is a protocol for the efficient and reliable distribution of large
files to many hosts on a LAN or internetwork. The protocol is built on
top of UDP, and uses a rate-based flow control mechanism following the
publishing metaphor.
NOVICE: Neural network robotic control
A robotic system using simple visual processing and controlled by
neural networks was developed. The robot performs docking and target
reaching without prior geometric calibration of its components. All
effects of control signals on the robot are learned by the controller
through visual observation during a training period, and refined during
actual operation. Minor changes in the system's configuration result
in a brief period of degraded performance while the controller adapts
to the new mappings.
Last update: 20 August 2010