Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud Pre-training [PDF]
Masked Autoencoders (MAE) have shown great potentials in self-supervised pre-training for language and 2D image transformers. However, it still remains an open question on how to exploit masked autoencoding for learning 3D representations of irregular ...
Renrui Zhang +7 more
semanticscholar +1 more source
PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency [PDF]
Removing outlier correspondences is one of the critical steps for successful feature-based point cloud registration. Despite the increasing popularity of introducing deep learning techniques in this field, spatial consistency, which is essentially ...
Xuyang Bai +7 more
semanticscholar +1 more source
VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection [PDF]
Accurate detection of objects in 3D point clouds is a central problem in many applications, such as autonomous navigation, housekeeping robots, and augmented/virtual reality.
Yin Zhou, Oncel Tuzel
semanticscholar +1 more source
Point-Bind & Point-LLM: Aligning Point Cloud with Multi-modality for 3D Understanding, Generation, and Instruction Following [PDF]
We introduce Point-Bind, a 3D multi-modality model aligning point clouds with 2D image, language, audio, and video. Guided by ImageBind, we construct a joint embedding space between 3D and multi-modalities, enabling many promising applications, e.g., any-
Ziyu Guo +10 more
semanticscholar +1 more source
PointRCNN: 3D Object Proposal Generation and Detection From Point Cloud [PDF]
In this paper, we propose PointRCNN for 3D object detection from raw point cloud. The whole framework is composed of two stages: stage-1 for the bottom-up 3D proposal generation and stage-2 for refining proposals in the canonical coordinates to obtain ...
Shaoshuai Shi +2 more
semanticscholar +1 more source
PointCLIP: Point Cloud Understanding by CLIP [PDF]
Recently, zero-shot and few-shot learning via Contrastive Vision-Language Pre-training (CLIP) have shown inspirational performance on 2D visual recognition, which learns to match images with their corresponding texts in open-vocabulary settings. However,
Renrui Zhang +8 more
semanticscholar +1 more source
SnowflakeNet: Point Cloud Completion by Snowflake Point Deconvolution with Skip-Transformer [PDF]
Point cloud completion aims to predict a complete shape in high accuracy from its partial observation. However, previous methods usually suffered from discrete nature of point cloud and unstructured prediction of points in local regions, which makes it ...
Peng Xiang +6 more
semanticscholar +1 more source
Self-Supervised Pretraining of 3D Features on any Point-Cloud [PDF]
Pretraining on large labeled datasets is a prerequisite to achieve good performance in many computer vision tasks like image recognition, video understanding etc.
Zaiwei Zhang +3 more
semanticscholar +1 more source
Segment Any Point Cloud Sequences by Distilling Vision Foundation Models [PDF]
Recent advancements in vision foundation models (VFMs) have opened up new possibilities for versatile and efficient visual perception. In this work, we introduce Seal, a novel framework that harnesses VFMs for segmenting diverse automotive point cloud ...
You-Chen Liu +7 more
semanticscholar +1 more source
PC2: Projection-Conditioned Point Cloud Diffusion for Single-Image 3D Reconstruction [PDF]
Reconstructing the 3D shape of an object from a single RGB image is a long-standing problem in computer vision. In this paper, we propose a novel method for single-image 3D reconstruction which generates a sparse point cloud via a conditional denoising ...
Luke Melas-Kyriazi +2 more
semanticscholar +1 more source

