Results 11 to 20 of about 1,226,149 (401)
Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models [PDF]
We present ODISE: Open-vocabulary DIffusion-based panoptic SEgmentation, which unifies pre-trained text-image diffusion and discriminative models to perform open-vocabulary panoptic segmentation. Text-to-image diffusion models have the remarkable ability
Jiarui Xu+5 more
semanticscholar +1 more source
Side Adapter Network for Open-Vocabulary Semantic Segmentation [PDF]
This paper presents a new framework for open-vocabulary semantic segmentation with the pre-trained vision-language model, named Side Adapter Network (SAN). Our approach models the semantic segmentation task as a region recognition problem. A side network
Mengde Xu+4 more
semanticscholar +1 more source
OpenMask3D: Open-Vocabulary 3D Instance Segmentation [PDF]
We introduce the task of open-vocabulary 3D instance segmentation. Current approaches for 3D instance segmentation can typically only recognize object categories from a pre-defined closed set of classes that are annotated in the training datasets.
Ayca Takmaz+5 more
semanticscholar +1 more source
Scaling Open-Vocabulary Object Detection [PDF]
Open-vocabulary object detection has benefited greatly from pretrained vision-language models, but is still limited by the amount of available detection training data.
Matthias Minderer+2 more
semanticscholar +1 more source
Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP [PDF]
Open-vocabulary semantic segmentation aims to segment an image into semantic regions according to text descriptions, which may not have been seen during training. Recent two-stage methods first generate class-agnostic mask proposals and then leverage pre-
Feng Liang+8 more
semanticscholar +1 more source
Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space [PDF]
Transformer-based language models (LMs) are at the core of modern NLP, but their internal prediction construction process is opaque and largely not understood. In this work, we make a substantial step towards unveiling this underlying prediction process,
Mor Geva+3 more
semanticscholar +1 more source
Towards Open Vocabulary Learning: A Survey [PDF]
In the field of visual scene understanding, deep neural networks have made impressive advancements in various core tasks like segmentation, tracking, and detection.
Jianzong Wu+11 more
semanticscholar +1 more source
Simple Open-Vocabulary Object Detection with Vision Transformers [PDF]
Combining simple architectures with large-scale pre-training has led to massive improvements in image classification. For object detection, pre-training and scaling approaches are less well established, especially in the long-tailed and open-vocabulary ...
Matthias Minderer+13 more
semanticscholar +1 more source
XLM-V: Overcoming the Vocabulary Bottleneck in Multilingual Masked Language Models [PDF]
Large multilingual language models typically rely on a single vocabulary shared across 100+ languages. As these models have increased in parameter count and depth, vocabulary size has remained largely unchanged. This \textit{vocabulary bottleneck} limits
Davis Liang+7 more
semanticscholar +1 more source
PLA: Language-Driven Open-Vocabulary 3D Scene Understanding [PDF]
Open-vocabulary scene understanding aims to localize and recognize unseen categories beyond the annotated label space. The recent breakthrough of 2D open-vocabulary perception is largely driven by Internet-scale paired image-text data with rich ...
Runyu Ding+5 more
semanticscholar +1 more source