We consider the problem of locating a robot in an initially-unfamiliar environment from visual input. The robot is not given a map of the environment, but it does have access to a limited set of training examples each of which specifies the video image observed when the robot is at a particular location and orientation. Such data might be acquired using dead reckoning the first time the robot entered an unfamiliar region (using some simple mechanism such as sonar to avoid collisions). In this paper, we address a specific variant of this problem for experimental and expository purposes: how to estimate a robot's orientation(pan and tilt) from sensor data. Performing the requisite scene reconstruction needed to construct a metric map of the environment using only video images is difficult. We avoid this by using an approach in which the robot learns to convert a set of image measurements into a representation of its pose (position and orientation). This provides a local metric description of the robot's relationship to a portion of a larger environment. A large-scale map might then be constructed from a collection of such local maps. In the case of our experiment, these maps express the statistical relationship between the image measurements and camera pose. The conversion from visual data to camera pose is implemented using multi-layer neural network that is trained using backpropagation. For extended environments, a separate network can be trained for each local region. The experimental data reported in this paper for orientation information (pan and tilt) suggests the accuracy of the technique is good while the on-line computational cost is very low. Related work is taking place in the context of the IRIS project (below). A recent article appears in Neural Computation.