Abstract We examine the problem of learning a visual map of the environment while maintaining an accurate pose estimate. Our approach is based on using two robots in a simple collaborative scheme; in practice, one of these robots can be much less capable than the other. In many mapping contexts, a robot moves about collecting data (images, in particular) which are later used to assemble a map; we can think of map construction as a training process. Without outside information, as a robot collects training images, its position estimate accumulates errors, thus corrupting its knowledge of the positions from which observations are taken. We address this problem by deploying a second robot to observe the first one as it explores, thereby establishing a \emph{virtual tether}, and enabling an accurate estimate of the robot's position while it constructs the map. We refer to this process as \emph{cooperative localization}. The images collected during this process are assembled into a representation that allows vision-based position estimation from a single image at a later date. In addition to developing a formalism and concept, we validate our results experimentally and present quantitative results demonstrating the performance of the method in over 90 trials.