Performing the requisite scene reconstruction needed to construct a metric map of the environment using only video images is difficult. We avoid this by using an approach in which the robot learns to convert a set of image measurements into a representation of its pose (position and orientation). This provides a local metric description of the robot's relationship to a portion of a larger environment. A large-scale map might then be constructed from a collection of such local maps. In the case of our experiment, these maps express the statistical relationship between the image measurements and camera pose. The conversion from visual data to camera pose is implemented using multi-layer neural network that is trained using backpropagation. For extended environments, a separate network can be trained for each local region. The experimental data reported in this paper for orientation information (pan and tilt) suggests the accuracy of the technique is good while the on-line computational cost is very low.
Related work is taking place in the context of the IRIS project (below). A recent article appears in Neural Computation. G.
Dudek, C. Zhang