25. April 2022

What Does the Future Look Like? Self-supervised 3D Point Cloud Prediction What Does the Future Look Like? Self-supervised 3D Point Cloud Prediction

IGG-Blogpost Series | Working Group Photogrammetry

Most autonomous cars use 3D laser scanners, so-called LiDARs, to perceive the 3D world around them. A LiDAR generates local 3D point clouds of the scene around the car. A typical LiDAR sensor generates around 10 of such point clouds per second. These 3D point clouds are widely used for numerous robotics and autonomous driving tasks, like localization, object detection, obstacle avoidance, mapping, scene interpretation, and trajectory prediction.

Given a sequence - of past point clouds [red] at a time T, the goal is to predict the F future scans [blue]. © IGG | Photogrammetry

Alle Bilder in Originalgröße herunterladen Der Abdruck im Zusammenhang mit der Nachricht ist kostenlos, dabei ist der angegebene Bildautor zu nennen.

In our recent work presented at CoRL 2021 by Benedikt Mersch and for which source code is available, we address the problem of predicting large and unordered future point clouds from a given sequence of past scans. High dimensional and sparse 3D point cloud data render point cloud prediction a challenging problem that is not yet fully explored. A future point cloud can be estimated by applying a predicted future scene flow to the last received scan or generating a new set of future points. Mersch et al. focus on the generation of new point clouds to predict the future scene.

In contrast to existing approaches, which exploit recurrent neural networks for modeling temporal correspondences, we use 3D convolutions to jointly encode spatial and temporal information. Our proposed approach takes a new 3D representation based on concatenated range images as input. It jointly estimates a future range image and per-point scores for being a valid or an invalid point for multiple future time steps. The method can obtain structural details of the environment by using skip connections and horizontal consistency using circular padding and provides more accurate predictions than other state-of-the-art approaches for point cloud prediction.

This approach allows for predicting detailed future point clouds of varying sizes with a reduced number of parameters to optimize resulting in faster training and inference times. Furthermore, the approach is also fully self-supervised and does not require any manual labeling of the data. In sum, the approach can predict a sequence of detailed future 3D point clouds from a given input sequence by a fast joint spatio-temporal point cloud processing using temporal 3D convolutional networks, outperforms state-of-the-art point cloud prediction approaches, generalizes well to unseen environments, and operate online faster than a typical rotating 3D LiDAR sensor frame rate.

Links