Course
Outline
Topics in Computer Science: Computational Perception
COMP 598 Winter 2013
Instructor: Professor
Michael Langer
Office:
McConnell Engineering, rm. 329
Tel:
514-398-3740
Email:
langer@cim.mcgill.ca
Office Hours:
by appointment (send me email)
Overview
This is a 3-credit undergraduate/graduate version of the 4-credit
graduate only course COMP 646
Computational Perception. The lectures will coincide for the
two courses. The difference between the courses is that COMP
646 has a Term Paper and an Oral Presentation which accounts for
1 more credit. Please see the COMP 646 web page for
details on the course contents and announcement.
The course examines fundamental computational problems in visual and
auditory perception. Unlike traditional perception courses offered in
Psychology or Physiology departments which emphasize neural
mechanisms, this course emphasizes computational aspects of
perception. What computational problems do our brains solve when we
see and hear? How can we describe the neural solutions to these
problems using computational models that are specific enough
that they can be programmed on a computer?
The course examines of two sensory systems: vision and audition. For
both, we begin by examining the measurement of physical signals from
the environment, namely visual and auditory images, and the
information that is contained in these images. For vision, we
consider image properties such as blur, color, shading, binocular
disparity, motion, texture, and perspective. For audition, we
consider information carried by impact versus non-impact sounds,
echos, as well as binaural timing and intensity differences. In both
cases, we examine how images are processed by the sensory system. We
express the computations uisng signal processing tools such as linear
system. For vision, we consider processing that occurs
in retinal and in the cortex. For audition, we discuss how the head
and ear transform sounds, how the cochlea decomposes sound waves into
frequency bands, and how the signals from the two ears are combined in
the brain.
For both vision and audition, we examine - at an abstract
computational level - how the brain infers properties of the
environment from the images. For vision, we consider how depth
and surface material properties are estimated. For audition, we
consider how direction of a sound source is estimated.
Prerequisites
Students are expected to be
comfortable programming in a high level programming language, at
least that level of COMP 250, and should also be
very comfortable with basic mathematics needed for an undergrad
degree in computer science, in particular:
- multivariable calculus (MATH 222 or equivalent)
- linear algebra (MATH 223 or equivalent) -
in particular, complex
numbers play an important role in signal analysis and so students
should be ready to use them.
- probability and statistics (MATH 323 would be great, but it is
more than you need -- the bare minimum is some familiarity with
normal/Gaussian distributions and
basic definitions such as
mean and variance and conditional probability)
The course is intended for upper level undergraduates or graduate students
in either the School of Computer Science (SOCS) or the Dept. of
Electrical and
Computer Engineering (ECE). I also welcome students from other
departments as well, such as Psychology or Music, assuming the
math skills and backgrounds are comparable to an advanced
undergraduate in CS.
Note that this is a 500 level course and so I am assuming that
undergraduates who are registered for it are honuors students or at
least meet the requirements of being in honours, namely have a GPA of
at least 3.0.
As for prerequisite knowledge of the subject... the course will cover
basic psychology and physiology of vision
and audition. It will also cover the basic tools of linear
system theory.
No prior knowledge of visual or auditory psychology, or
physiology, or linear systems theory is assumed.
Lecture Notes and Readings
The course consists of lectures given by the
instructor.
Slides and lecture notes by the
instructor will be made available as PDFs on the course web page.
Readings also will be made available in electronic form or as
handouts. There is no textbook for the course.
Students registered for COMP 646 will give
presentations, but attendence for these is not required for
students taking COMP 598.
Evaluation
For COMP 598, there are two components to the grade (total 100%):
- Assignments (40 %)
- There will be three assignments. Each will involve some
MATLAB programming, and questions related to the lectures or to
specific research articles related to the lectures. Students are not
required to know MATLAB prior to the course.
- Exams (60 %)
-
There will be two in-class exams and a final exam which will
take place during the final exam period. The two in class
exams will be worth approximately 15% and the final exam will be
worth approximately 30%. If the grade of the final exam is
greater than the grade on the two in class exams, then the final
exam will be worth 60% of the final course grade.
Course Topics
A brief description of some of the topics covered in this course is
follows. For more information, see the COMP 646
web page.
Vision
- Photometry:
Pixels
measure intensity of
light. These measurements are limited by dynamic
range of
the sensor, noise,
and
wavelength sensitivity.
- Color:
The light reflected
from objects is composed
of
energy at different wavelengths. These wavelength
distributions
depend on the light source (e.g. sun vs. light bulb) as
well as on the surface material (pigments). Vision
systems have photoreceptors that are wavelength specific. How
can
the intensities that are measured by different photoreceptors
be
used to disentangle the wavelength properties of the light source from
those of the reflecting surfaces?
- Binocular
stereopsis:
Binocular vision is the problem
of
taking
two images of a scene from two different vantage points using the
slight positional differences in the two images to compute the relative
depth of objects in the scene. What are corresponding pixels
in a
pair of images? What does positional disparity tell a vision
system about depth?
- Spatial
vision: A basic task
in vision is to find
boundaries in
images, where properties of the images intensities change.
Such
boundaries often indicate important events such as object edges in the
3D scene. Edges are often indicated by a change in image
intensity, color, or texture. These changes are often
difficult
to detect reliably in small local areas, however, and one needs to
group information across regions.
- Texture:
Many 3D objects
contain rough or pigmented
surfaces
such that the reflected light intensities vary significantly
from
point to point. In the image, the projected boundaries of
such
objects cannot be found by finding brightnesses edges. Rather
the
boundaries must be found by examining various higher order'
properties of images, such as changes in oriented structure or
contrast.
- Motion:
Images can vary over
time, either because
objects
are
moving in the scene or because the camera is moving. Accurate estimates
of motion can be used to perceive the motion of the observer
and
motion of objects.
- Shading
and shape:
Image intensities
are
determined by relationships between the 3D surface shape and the
lighting. We will examine several models of lighting and
shading
and ask how the lighting and surface shape themselves can be inferred
from image intensities.
Audition
- Properties
of sounds. Sounds
are pressure waves and as
such can be analyzed in terms of
their
frequency
components over time. Such an analysis should be familiar to
anyone who has studied musical notation. A similar
frequency-time
analysis can be defined for natural sounds as well, especially for
speech.
- Auditory
pathways in human
hearing: Anatomy of the ear. How is sound measured
and
coded?
- Spatial
hearing:
The sound that arrives at the ear drum is
not the same as the sound that is emitted from the source, but rather
has been shaped by the head and ear. This reshaping of the
sound
is useful for perceiving
the spatial direction of a sound.
- Echolocation
and
echorecognition: Bats and dolphins
are
able to
localize and identify objects by emitting sounds and listening for the
reflections. What are the strategies involved?
Academic
Integrity [statement
below is required on all Course Outlines]
McGill
University values
academic integrity. Therefore, all students
must understand the meaning and consequences of cheating,
plagiarism and other academic offences under the Code of Student
Conduct and Disciplinary Procedures. See {\bf
www.mcgill.ca/integrity} for more information, as well as the Student
Guide to Avoid Plagiarism.
http://www.mcgill.ca/integrity/studentguide