Skip to content. Skip to navigation
CIM Menus

Learning to do Vision by Inverting a Graphics Model

Vinod Nair
Dept. of Computer Science University of Toronto

September 2, 2005 at  2:00 PM
Zames Seminar Room - MC437

I'll describe a fairly general way of inverting graphics programs to produce vision programs. Suppose that we have a graphics program that takes a small number of parameters as input and generates an image. We want to train a neural network that can take an image as input and recover the graphics parameters that generated it. The key problem is that we typically don't know what parameter values generated the real-world images that we're interested in, so there is no labelled training set of image-parameter pairs for doing supervised learning. I'll describe a new unsupervised learning algorithm that gets around this problem. The algorithm is demonstrated on the task of modelling images of handwritten digits. Using a graphics program that can generate realistic digit images, a neural network is trained for each digit class to recover graphics parameters from images. Digit classification can then be performed by seeing how accurately the parameters extracted by each class-specific network reconstructs a test image. This approach achieves a classification error of 1.82% on the MNIST digits database.

Joint work with Geoffrey Hinton.