Synthetically trained 3d visual tracker of underwater vehicles

Abstract

We present a method for visually detecting and tracking the 3D pose of autonomous underwater vehicles, which aims to enable robust multi-robot convoying. We follow the approach of tracking-by-detection, which combines the robust, drift-free nature of object detection with the temporal consistency of tracking algorithms. Central to our method is a multi-output convolutional network that jointly predicts whether the target robot is present in the image (classification), the 2D bounding box around the target in the image plane, and the 3D orientation of the target. This, combined with camera intrinsic parameters and prior knowledge of the robot’s absolute scale, allows us to recover the full 6-degree-of-freedom pose (translation and orientation) of the target robot. To train our network, we use only synthetic images rendered using the Unreal game engine, which is a cost-effective way to produce a large training set without the need for laborious manual annotations. Our evaluation analyzes the impact of orientation offset on 3D detection accuracy, and demonstrates successful generalization of the learned model to real underwater photographs of the target robot.

Publication
OCEANS 2018 MTS/IEEE Charleston